Methods for non-invasive prenatal ploidy calling

ABSTRACT

Disclosed herein are methods for determining the copy number of a chromosome in a fetus in the context of non-invasive prenatal diagnosis. In an embodiment, the measured genetic data from a sample of genetic material that contains both fetal DNA and maternal DNA is analyzed, along with the genetic data from the biological parents of the fetus, and the copy number of the chromosome of interest is determined. In an embodiment, the maternal serum is measured using a single-nucleotide polymorphism (SNP) microarray, along with parental genomic data, and the determination of the chromosome copy number is used to make clinical decisions pertaining to the fetus.

RELATED APPLICATIONS

This application is a divisional of U.S. Utility application Ser. No.15/252,795, filed Aug. 31, 2016. U.S. Utility application Ser. No.15/252,795 is a continuation of U.S. Utility application Ser. No.14/983,128 filed Dec. 29, 2015, which is a continuation of U.S. Utilityapplication Ser. No. 14/080,656, filed Nov. 14, 2013, now U.S. Pat. No.9,228,234, issued Jan. 5, 2016, which is a continuation of U.S. Utilityapplication Ser. No. 13/896,293, filed May 16, 2013, now Abandoned,which is a continuation of U.S. Utility application Ser. No. 13/499,086,filed Mar. 29, 2012, now Abandoned, which is a national phase filingunder 35 U.S.C. § 371 of International Application No. PCT/US10/50824,filed Sep. 30, 2010, which claims the benefit of the following U.S.Provisional Patent Applications: Ser. No. 61/277,876, filed Sep. 30,2009; Ser. No. 61/337,931, filed Feb. 12, 2010; and Ser. No. 61/395,850,filed May 18, 2010; all the disclosures thereof are incorporated byreference herein in their entirety.

BACKGROUND

A human being normally has two sets of 23 chromosomes in every somaticcell, with one copy coming from each parent. Aneuploidy, a state where acell has the wrong number of chromosomes, is responsible for asignificant percentage of children born with genetic conditions.Detection of chromosomal abnormalities can identify individuals,including fetuses or embryos, with conditions such as Down syndrome,Edwards syndrome, Klinefelters syndrome, and Turner syndrome, amongothers. Since chromosomal abnormalities are generally undesirable, thedetection of such a chromosomal abnormality in a fetus may provide thebasis for the decision to terminate a pregnancy.

Prenatal diagnosis can alert physicians and parents to abnormalities ingrowing fetuses. Some currently available methods, such as amniocentesisand chorionic villus sampling (CVS), are able to diagnose geneticdefects with high accuracy; however, they may carry a risk ofspontaneous abortion. Other methods can indirectly estimate a risk ofcertain genetic defects non-invasively, for example from hormone levelsin maternal blood and/or from ultrasound data, however their accuraciesare much lower. It has recently been discovered that cell-free fetal DNAand intact fetal cells can enter maternal blood circulation. Thisprovides an opportunity to directly measure genetic information about afetus, specifically the aneuploidy state of the fetus, in a manner whichis non-invasive, for example from a maternal blood draw.

SUMMARY

Methods for non-invasive prenatal ploidy calling are disclosed herein.In an embodiment of the present disclosure, methods are disclosed forthe determination of the ploidy state of a target individual where themeasured genetic material of the target is contaminated with geneticmaterial of the mother, by using the knowledge of the maternal geneticdata. This is in contrast to methods that are able to determine theploidy state of a target individual from genetic data that is noisy dueto poor measurements; the contamination in this data is random. This isalso in contrast to methods that are able to determine the ploidy stateof a target individual from genetic data that is difficult to interpretbecause of contamination by DNA from unrelated individuals; thecontamination in that data is genetically random. In an embodiment, themethods disclosed herein are able to determine the ploidy state of atarget individual when the difficulty of interpretation is due tocontamination of DNA from a parent; the contamination in this data is atleast half identical to the target data, and is therefore difficult tocorrect for. In order to achieve this end, in an embodiment a method ofthe present disclosure uses the knowledge of the contaminating maternalgenotype to create a model of the expected genetic measurements given amixture of the maternal and the target genetic material, wherein thetarget genetic data is not known a priori. This step is not necessarywhere the uncertainty in the genetic data is due to random noise.

According to aspects illustrated herein, there is provided a method thatenables the determination of the ploidy state of a target individualusing genetic material from the target individual when the targetindividual's genetic material is contaminated by other genetic material.In an embodiment, the target individual is a fetus, and the targetindividual's genetic data comprises free floating DNA found in maternalblood, and the contaminating genetic material comprises free floatingmaternal DNA also found in maternal blood. In an embodiment, the targetindividual is a fetus, and the target individual's genetic datacomprises DNA found in fetal cells found in maternal blood, and thecontaminating genetic material comprises DNA found in maternal cellsalso found in maternal blood. In an embodiment, the target individual isa fetus, and the determination of the ploidy state is done in thecontext of non-invasive prenatal diagnosis, and where a clinicaldecision is made based on the ploidy state determination. In anembodiment, genetic data from one or both parents of the targetindividual is used in the determination of the ploidy state of thetarget. In an embodiment, the chromosomes of interest includechromosomes 13, 18, 21, X and Y. In an embodiment, the determination istransformed into a report which may be sent to a relevant healthcarepractitioner. In an embodiment, the series of steps outlined aboveresult in a transformation of the genetic matter of a pregnant motherand the father into an actionable decision that results in a pregnancybeing continued or terminated. In an embodiment, the ploidy statedetermination is used to make a clinical decision. In an embodiment theclinical decision may be to terminate a pregnancy where the fetus isfound to have a genetic abnormality.

While the disclosure focuses on genetic data from human subjects, andmore specifically on developing fetuses, as well as related individuals,it should be noted that the methods disclosed apply to the genetic dataof a range of organisms, in a range of contexts. The techniquesdescribed for making ploidy determination are most relevant in thecontext of prenatal diagnosis in conjunction with amniocentesis, chorionvillus biopsy, fetal tissue sampling, and non-invasive prenataldiagnosis, where a small quantity of fetal genetic material is isolatedfrom maternal blood, for example prenatal serum screens, the tripletest, the quad test. The use of this method may facilitate diagnosesfocusing on inheritable diseases, chromosome copy number predictions,increased likelihoods of defects or abnormalities, as well as makingpredictions of susceptibility to various disease-and non-diseasephenotypes for individuals to enhance clinical and lifestyle decisions.

In an embodiment of the present disclosure, the fetal or embryonicgenomic data, with or without the use of genetic data from relatedindividuals, can be used to detect if the cell is aneuploid, that is,where the wrong number of one or more autosomal chromosomes are presentin an individual, and/or if the wrong number of sexual chromosomes arepresent in the individual. The genetic data can also be used to detectfor uniparental disomy, a condition in which two of a given chromosomeare present, both of which originate from one parent. This is done bycreating a set of hypotheses about the potential states of the DNA, andtesting to see which hypothesis has the highest probability of beingtrue given the measured data.

In an embodiment of the present disclosure, the small amount of geneticmaterial of a fetus, which may be mixed with maternal genetic material,may be transformed through amplification into a large amount of geneticmaterial that encodes similar or identical genetic data. The geneticdata contained molecularly in the large amount of genetic material maybe transformed into raw genetic data in the form of digital signals,optionally stored in computer memory, by way of a genotyping method. Theraw genetic data may be transformed, by way of the PARENTAL SUPPORT™method, into copy number calls for one or a number of chromosomes, alsooptionally stored in computer memory. The copy number call may betransformed into a report for a physician, who may then act on theinformation in the report.

In an embodiment of the present disclosure, the direct measurements ofgenetic material, amplified or unamplified, present at a plurality ofloci, can be used to detect for monosomy, uniparental disomy, matchedtrisomy, unmatched trisomy, tetrasomy, and other aneuploidy states. Oneembodiment of the present disclosure takes advantage of the fact thatunder some conditions, the average level of amplification andmeasurement signal output is invariant across the chromosomes, and thusthe average amount of genetic material measured at a set of neighboringloci will be proportional to the number of homologous chromosomespresent, and the ploidy state may be called in a statisticallysignificant fashion. In another embodiment, different alleles have astatistically different characteristic amplification profiles given acertain parent context and a certain ploidy state; these characteristicdifferences can be used to determine the ploidy state of the chromosome.

In an embodiment of the present disclosure, calculated, phased,reconstructed and/or determined genetic data from the target individualand/or from one or more related individuals may be used as input for aploidy calling aspect of the present disclosure.

In an embodiment, a method for determining a copy number of a chromosomeof interest in a target individual, using genotypic measurements made ongenetic material from the target individual, wherein the geneticmaterial of the target individual is mixed with genetic material fromthe mother of the target individual, comprises obtaining genotypic datafor a set of SNPs of the parents of the target individual; makinggenotypic measurements for the set of SNPs on a mixed sample thatcomprises DNA from the target individual and also DNA from the mother ofthe target individual; creating, on a computer, a set of ploidy statehypothesis for the chromosome of interest of the target individual;determining, on the computer, the probability of each of the hypothesesgiven the genetic measurements of the mixed sample and the genetic dataof the parents of the target individual; and using the determinedprobabilities of each hypothesis to determine the most likely copynumber of the chromosome of interest in the target individual.

In an embodiment, the target individual is a fetus. In an embodiment,the copy number determination is used to make a clinical decision. In anembodiment, the target individual is a fetus, and the clinical decisionis to terminate a pregnancy where the fetus is found to have a geneticabnormality, or to not terminate the pregnancy where the fetus is notfound to have a genetic abnormality. In an embodiment, the set of SNPscomprises a plurality of SNPs from the chromosome of interest, and aplurality of SNPs from at least one chromosome that is expected to bedisomic on the target individual.

In an embodiment, the step of determining, on the computer, theprobability of each of the hypotheses comprises using the genotypic dataof the parents to determine parental contexts for each of the SNPs;grouping the genotypic measurements of the mixed sample into theparental contexts; using the grouped genotypic measurements from atleast one chromosome that is expected to be disomic to determine aplatform response; using the grouped genotypic measurements from atleast one chromosome that is expected to be disomic to determine a ratioof fetal to maternal DNA in the mixed sample; using the determinedplatform response and the determined ratio to predict an expecteddistribution of SNP measurements for each set of SNPs in each parentalcontext under each hypothesis; and calculating the probabilities thateach of the hypotheses is true given the platform response, and giventhe ratio, and given the grouped genotypic measurements of the mixedsample, and given the predicted expected distributions, for eachparental context, for each hypothesis.

In an embodiment, the chromosome of interest is selected from the groupconsisting of chromosome 13, chromosome 18, chromosome 21, the Xchromosome, the Y chromosome, and combinations thereof. In anembodiment, the method is used to determine the copy number of a numberof chromosomes in the target individual, where the number is selectedfrom the group consisting of one, two, three, four, five, six, seven,eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen,seventeen, eighteen, nineteen, twenty, twenty one, twenty two, andtwenty three.

In an embodiment, the mixed sample is maternal blood, maternal plasma orsome other substance taken from a pregnant mother. In an embodiment, thetarget individual's genetic material is free floating DNA found inmaternal blood or serum. In an embodiment, the target individual'sgenetic material is nuclear DNA found in one or more cells from thetarget individual. In an embodiment, a confidence is computed for thechromosome copy number determination. In an embodiment, the ratio offetal to maternal DNA in the mixed sample is determined for individualchromosomes.

In an embodiment, the step of obtaining of genotypic data, and/or thestep of making genetotypic measurements is done by measuring geneticmaterial using techniques selected from the group consisting of padlockprobes, circularizing probes, genotyping microarrays, SNP genotypingassays, chip based microarrays, bead based microarrays, other SNPmicroarrays, other genotyping methods, Sanger DNA sequencing,pyrosequencing, high throughput sequencing, reversible dye terminatorsequencing, sequencing by ligation, sequencing by hybridization, othermethods of DNA sequencing, other high throughput genotyping platforms,fluorescent in situ hybridization (FISH), comparative genomichybridization (CGH), array CGH, and multiples or combinations thereof.In an embodiment, the step of measuring genetic material is done ongenetic material that is amplified, prior to being measured, using atechnique that is selected from the group consisting of Polymerase ChainReaction (PCR), ligand mediated PCR, degenerative oligonucleotide primerPCR, Multiple Displacement Amplification (MDA), allele-specific PCR,allele-specific amplification techniques, bridge amplification, padlockprobes, circularizing probes, and combinations thereof.

In an embodiment, the step of determining the copy number of thechromosome of interest is performed for the purpose of screening for achromosomal condition where the chromosomal condition is selected fromthe group consisting of nullsomy, monosomy, disomy, uniparental disomy,euploidy, trisomy, matched trisomy, unmatched trisomy, maternal trisomy,paternal trisomy, tetrasomy, matched tetrasomy, unmatched tetrasomy,other aneuploidy, unbalanced translocation, balanced translocation,recombination, deletion, insertion, mosaicism, and combinations thereof.

In an embodiment, the method is used for the purpose of paternitytesting.

In an embodiment, a method for determining a copy number of a chromosomeof interest in a target individual, using genotypic measurements made ongenetic material from the target individual, wherein the geneticmaterial of the target individual is mixed with genetic material fromthe mother of the target individual, comprises obtaining genotypic datafor a set of SNPs of the mother of the target individual; makinggenotypic measurements for the set of SNPs on a mixed sample thatcomprises DNA from the target individual and also DNA from the mother ofthe target individual; creating, on a computer, a set of ploidy statehypothesis for the chromosome of interest of the target individual;determining, on the computer, the probability of each of the hypothesesgiven the genetic measurements of the mixed sample and the genetic dataof the parents of the target individual; and using the determinedprobabilities of each hypothesis to determine the most likely copynumber of the chromosome of interest in the target individual.

It will be recognized by a person of ordinary skill in the art, giventhe benefit of this disclosure, that various aspects and embodiments ofthis disclosure may be implemented in combination or separately.

BRIEF DESCRIPTION OF THE DRAWINGS

The presently disclosed embodiments will be further explained withreference to the attached drawings, wherein like structures are referredto by like numerals throughout the several views. The drawings shown arenot necessarily to scale, with emphasis instead generally being placedupon illustrating the principles of the presently disclosed embodiments.

FIGS. 1A and 1B show a model fit for both X (left plot, FIG. 1A) and Y(right plot, FIG. 1B) channels in a sample with 40 percent DNA from thetarget individual.

FIG. 2 shows components of the measurement vector compared against thepredictions under three hypotheses. This plot is from chromosome 16 of asample which is 40 percent target DNA and 60 percent mother DNA. Thetrue hypothesis is H110.

FIG. 3 shows components of the measurement vector compared against thepredictions under three hypotheses. FIG. 3 is from chromosome 21 of thesame sample as in FIG. 2, and the correct hypothesis is H210.

While the above-identified drawings set forth presently disclosedembodiments, other embodiments are also contemplated, as noted in thediscussion. This disclosure presents illustrative embodiments by way ofrepresentation and not limitation. Numerous other modifications andembodiments can be devised by those skilled in the art which fall withinthe scope and spirit of the principles of the presently disclosedembodiments.

DETAILED DESCRIPTION

In an embodiment of the present disclosure, the ploidy state of a targetindividual can be determined for one, some, or all chromosomes, in theindividual. In one embodiment of the invention, the genetic material ofthe target individual is used to make the ploidy determination, andwhere the genetic material of the target individual is contaminated withgenetic material of the mother of the target individual. In oneembodiment of the invention, genetic data of one or both parents of thetarget individual, optionally including genetic data from otherrelatives of the target individual is used in the ploidy determination.

Copy number calling is the concept of determining the number andidentity of chromosomes in an individual, either on a per cell basis, orin a bulk manner. In one embodiment of the invention, the amount ofgenetic material contained in a single cell, a small group of cells, ora sample of DNA may be used as a proxy for the number of chromosomes inthe target individual. The present disclosure allows the determinationof aneuploidy from the genetic material contained in a small sample ofcells, or a small sample of DNA, provided the genome of at least one orboth parents are available. Some aspects of the present disclosure usethe concept of parental context, where the parental contexts describe,for a given SNP, the possible set of alleles that a child may haveinherited from the parents. For each set of SNPs that belong to a givenparental context, a specific statistical distribution of SNPmeasurements is expected, and that distribution will vary depending onthe parental context and on the ploidy state of the chromosome segmenton which the SNP is found. By analyzing the actual distributions of theSNPs in different parental contexts, and comparing them with theexpected distribution of those SNPs for different ploidy statehypotheses, it is possible to calculate which ploidy state is mostlikely to be correct. This may be particularly useful in the case ofprenatal diagnosis, wherein a limited amount of DNA is available, andwhere the determination of the ploidy state of a target, such as afetus, has a high clinical impact.

A number of informatics techniques that may be appropriate to use inconjunction with the invention described in this disclosure aredescribed in the following three references: U.S. Publication No.2007/0184467, published on Aug. 9, 2007, U.S. Publication No.2008/0243398, published on Oct. 2, 2008 and PCT Publication No.WO/2010/017214, published on Feb. 11, 2010. These references arereferred to herein as Rabinowitz 2006, 2008 and 2009, respectively, andthe methods described in these references, along with the methodsdescribed in this disclosure, may be collectively referred to asPARENTAL SUPPORT™.

DNA measurements, whether obtained by sequencing techniques, genotypingarrays, or any other technique, contain a degree of error. The relativeconfidence in a given DNA measurement is affected by many factors,including the amplification method, the technology used to measure theDNA, the protocol used, the amount of DNA used, the integrity of the DNAused, the operator, and the freshness of the reagents, just to name afew. One way to increase the accuracy of the measurements is to useinformatics based techniques to infer the correct genetic state of theDNA in the target based on the knowledge of the genetic state of relatedindividuals. Since related individuals are expected to share certainaspect of their genetic state, when the genetic data from a plurality ofrelated individuals is considered together, it is possible to identifylikely errors and omissions in the measurements, and increase theaccuracy of the knowledge of the genetic states of all the relatedindividuals. In addition, a confidence may be computed for each callmade.

For the purposes of this disclosure, a computer readable medium is amedium that stores computer data in machine readable form. By way ofexample, and not limitation, a computer readable medium can comprisecomputer storage media as well as communication media, methods orsignals. Computer storage media, also called computer memory, includesvolatile and non-volatile, removable and non-removable media implementedin any method or technology for storage of information such ascomputer-readable instructions, data structures, program modules orother data. Computer storage media includes, but is not limited to, RAM,ROM, EPROM, EEPROM, flash memory or other solid state memory technology;CD-ROM, DVD, or other optical storage; cassettes, tape, disk, or othermagnetic storage devices; or any other medium which can be used totangibly store the desired information and which can be accessed by thecomputer.

Various implementations of the systems and techniques described here canbe realized in digital electronic circuitry, integrated circuitry,specially designed ASICs (application-specific integrated circuits),computer hardware, firmware, software, and/or combinations thereof.These various implementations can include one or more computer programsthat are executable and/or interpretable on a programmable systemincluding at least one programmable processor, which may be special orgeneral purpose, coupled to receive data and instructions from, and totransmit data and instructions to, a storage system, at least one inputdevice, and at least one output device. Such computer programs (alsoknown as programs, software, software applications or code) may includemachine instructions for a programmable processor, and may beimplemented in any form of programming language, including high-levelprocedural and/or object-oriented programming languages, and/or inassembly/machine languages. A computer program may be deployed in anyform, including as a stand-alone program, or as a module, component,subroutine, or other unit suitable for use in a computing environment. Acomputer program may be deployed to be executed or interpreted on onecomputer or on multiple computers at one site, or distributed acrossmultiple sites and interconnected by a communication network.

Definitions

SNP (Single Nucleotide Polymorphism) refers to a single nucleotide thatmay differ between the genomes of two members of the same species. Theusage of the term should not imply any limit on the frequency with whicheach variant occurs. The term SNP may include other allelic variationsthat occur over a number of nucleotides.

To call a SNP refers to the act of making a decision about the truestate of a particular base pair, taking into account the direct andindirect evidence.

Sequence refers to a DNA sequence or a genetic sequence. It may refer tothe primary, physical structure of the DNA molecule or strand in anindividual.

Allele refers to the genes that occupy a particular locus.

To call an allele refers to the act of determining the genetic state ata particular locus of DNA. This may involve calling a SNP, a pluralityof SNPs, or determining whether or not an insertion or deletion ispresent at that locus, or determining the number of insertions that maybe present at that locus, or determining whether some other geneticvariant, such as a single tandem repeats (STRs), or how many of thatvariant, are present at that locus.

Locus refers to a specific location of a gene or DNA sequence on achromosome.

Ploidy calling, also ‘chromosome copy number calling,’ ‘copy numbercalling,’ ‘ploidy state determination,’ or ‘copy number determination,’is the act of determining the quantity and possibly also the chromosomalidentity of one or more chromosomes present in a cell.

Calling a hypothesis, refers to determining which hypothesis has thegreatest likelihood of being true. The act of calling may be that pointat which a decision is made about which hypothesis will be outputted asthe call.

Confidence refers to the statistical likelihood that the called SNP,allele, set of alleles, or determined number of chromosomes copies orchromosome segment copies correctly represents the real genetic state ofthe individual.

Aneuploidy refers to the state where the wrong number of chromosomes arepresent in a cell. In the case of a somatic human cell it may refer tothe case where a cell does not contain 22 pairs of autosomal chromosomesand one pair of sex chromosomes. In the case of a human gamete, it mayrefer to the case where a cell does not contain one of each of the 23chromosomes. When referring to a single autosomal chromosome, it mayrefer to the case where more or less than two homologous chromosomes arepresent. When referring to the sex chromosome, it may refer to the casethere more or less than two of either X or Y chromosomes, or exactly twoY chromosomes, are present.

Ploidy State is the quantity and chromosomal identity of one or morechromosomes in a cell. It may refer to the total number and identity ofeach chromosome typically found in each cell of a given individual. Itmay refer to the number and identity of chromosome(s) for a particularchromosome number for a given individual.

Chromosomal identity refers to the referent chromosome number. Normalhumans have 22 types of numbered autosomal chromosomes, and two types ofsex chromosomes. It may also refer to the parental origin of thechromosome. It may also refer to a specific chromosome inherited fromthe parent, i.e. the chromosome that the parent inherited from his/herfather, or the chromosome that the parent inherited from his/her mother.It may also refer to other identifying features of a chromosome. Theidentity of a chromosome may refer to the actual identity of aparticular chromosome, or the identities of the chromosomes of aparticular chromosome number, in each cell of a particular individual.For example, the chromosomal identity could be: ‘chromosome 21’ or itrefer to a particular chromosome 21 with a particular genetic state,that is, for example, ‘inherited from the mother, and homologous but notidentical to two other chromosome 21 s found in a particular female withDown syndrome.’

Chromosomal number refers to the cardinal number commonly assigned to agiven chromosome, of which humans have 22 pairs of autosomal chromosomesand one pair of sex chromosomes, for a total of 23. The chromosomenumber may be a number from 1 to 23, and in the case of chromosome 23,it may be referred to as X or Y. It may refer to a class of chromosomes,for example, a child with Down syndrome may be found to have threechromosome 21 s.

The State of the Genetic Material or simply ‘genetic state’ refers tothe actual identity of a set of SNPs on the DNA, it may refer to thephased haplotypes of the genetic material, and it may refer to thesequence of the DNA, including insertions, deletions, repeats andmutations in an individual. It may also refer to the actual ploidy stateof one or more chromosomes, chromosomal segments, or set of chromosomalsegments in an individual.

Genetic abnormality refers to a genetic state that is highly correlatedwith a phenotypic abnormality. Aneuploidy is an example of a geneticabnormality. A genetic state that results in the death of a fetus oryoung child is a phenotypic abnormality.

Genotypic measurements, (or ‘genetic measurements’) a type of genotypicdata, such as numerical, digital, pictorial or figurativerepresentations of genotypic data that are obtained by using agenotyping technique to ascertain certain base pair sequences and/oridentities, qualities or other characteristics of genetic material,chiefly, DNA. Genetic measurements may contain errors or omissions.

Genetic Data refers to data that describes a genetic state. The geneticdata may take the form of genetic measurements, it may be encoded in ananalog or digital fashion, it may be encoded on a computer, or it maytake the form of a physical molecular genetic sequence.

Measuring genetic material refers to the act of transforming geneticdata from a physical manifestation, for example, a specific base pairsequence, into a figurative representation of the genetic data, forexample the representation of the genetic sequence stored digitally on acomputer.

Mosaicism refers to a set of cells in an embryo, or other individualthat are heterogeneous with respect to their ploidy state.

Homologous Chromosomes are chromosomes that contain the same set ofgenes that may normally pair up during meiosis.

Identical Chromosomes are chromosomes that contain the same set ofgenes, and for each gene they have the same set of alleles that areidentical, or nearly identical.

Allele Drop Out or ‘ADO’ refers to the situation where one of the basepairs in a set of base pairs from homologous chromosomes at a givenallele is not detected. ADO may refer to LDO.

Locus Drop Out or ‘LDO’ refers to the situation where both base pairs ina set of base pairs from homologous chromosomes at a given allele arenot detected.

Homozygous refers to having similar alleles or SNPs at correspondingchromosomal loci.

Heterozygous refers to having dissimilar alleles or SNPs atcorresponding chromosomal loci.

Chromosomal Region refers to a segment of a chromosome, or a fullchromosome.

Segment of a Chromosome refers to a section of a chromosome that canrange in size from one base pair to the entire chromosome.

Chromosome refers to either a full chromosome, or also a segment orsection of a chromosome.

Copies refer to the number of copies of a chromosome segment and mayrefer to identical copies, or it may refer to non-identical, homologouscopies of a chromosome segment wherein the different copies of thechromosome segment contain a substantially similar set of loci, andwhere one or more of the alleles are different. Note that in some casesof aneuploidy, such as the M2 copy error, it is possible to have somecopies of the given chromosome segment that are identical as well assome copies of the same chromosome segment that are not identical.

Haplotype is a combination of alleles at multiple loci that aretransmitted together on the same chromosome. Haplotype may refer to asfew as two loci or to an entire chromosome depending on the number ofrecombination events that have occurred between a given set of loci.Haplotype can also refer to a set of single nucleotide polymorphisms(SNPs) on a single chromatid that are statistically associated. Ahaplotype may also be referred to as a ‘strand’, referring go to thefact that haplotypes are physically connected on one strand of DNA.

Haplotypic Data also called ‘phased data’ or ‘ordered genetic data;’refers to data from a single chromosome in a diploid or polyploidgenome, i.e., either the segregated maternal or paternal copy of achromosome in a diploid genome.

Phasing refers to the act of determining the haplotypic genetic data ofan individual given unordered, diploid (or polyploidy) genetic data. Itmay refer to the act of determining which of two genes at an allele, fora set of alleles found on one chromosome, are associated with each ofthe two homologous chromosomes in an individual.

Phased Data refers to genetic data where the haplotype been determined.

Genetic data ‘in’, ‘of’, ‘at’, ‘from’ or ‘on’ an individual, (also‘genotypic data’) refers to the data describing aspects of the genome ofan individual. It may refer to one or a set of loci, partial or entiresequences, partial or entire chromosomes, or the entire genome.

Hypothesis refers to a set of possible ploidy states at a given set ofchromosomes, or a set of possible allelic states at a given set of loci.The set of possibilities may contain one or more elements.

Copy number hypothesis, also ‘ploidy state hypothesis,’ refers to ahypothesis concerning how many copies of a particular chromosome are inan individual on per cell basis. It may also refer to a hypothesisconcerning the identity of each of the chromosomes, including the parentof origin of each chromosome, and which of the parent's two chromosomesare present in the individual. It may also refer to a hypothesisconcerning which chromosomes, or chromosome segments, if any, from arelated individual correspond genetically to a given chromosome from anindividual.

Target Individual refers to the individual whose genetic state is beingdetermined. In one context, only a limited amount of DNA is availablefrom the target individual. In one context, the target individual is afetus. In some embodiments, there may be more than one targetindividual. In some embodiments, each child, embryo, fetus or sperm thatoriginated from a pair of parents may be considered target individuals.

Related Individual refers to any individual who is genetically relatedto, and thus shares haplotype blocks with, the target individual. In onecontext, the related individual may be a genetic parent of the targetindividual, or any genetic material derived from a parent, such as asperm, a polar body, an embryo, a fetus, or a child. It may also referto a sibling or a grandparent.

Parent refers to the genetic mother or father of an individual. Anindividual will typically have exactly two parents, one mother and onefather. A parent may be considered to be an individual.

Parental context, (also ‘context’), refers to the genetic state of agiven SNP, on each of the two relevant homologous chromosomes for eachof the two parents of the target.

Isolation refers to a physical separation of the target genetic materialfrom other contaminating genetic material or biological material. It mayalso refer to a partial isolation, where the target of isolation isseparated from some or most, but not all of the contaminating material.For example, isolating fetal DNA may refer to isolating a fraction offetal DNA that is preferentially enriched in fetal DNA as compared tothe original sample.

Clinical Decision refers to any decision to take an action, or not totake an action, that has an outcome that affects the health or survivalof an individual. In the context of prenatal diagnosis, a clinicaldecision may refer to a decision to abort or not abort a fetus. Aclinical decision may refer to a decision to conduct further testing, orto take mitigating actions.

Platform response refers to the mathematical characterization of theinput/output characteristics of a genetic measurement platform, and maybe used as a measure of the statistically predictable measurementdifferences.

Informatics based method refers to a method designed to determine theploidy state at one or more chromosomes or the allelic state at one ormore alleles by statistically inferring the most likely state, ratherthan by directly physically measuring the state. In one embodiment ofthe present disclosure, the informatics based technique may be onedisclosed in this patent. In one embodiment of the present disclosure itmay be PARENTAL SUPPORT™.

Channel Intensity refers to the strength of the fluorescent or othersignal associated with a given allele, base pair or other genetic markerthat is output from a method that is used to measure genetic data. Itmay refer to a set of outputs from a device for measuring DNA. In oneembodiment, it may refer to the set of outputs from a genotyping array.

Parental Context

The parental context refers to the genetic state of a given SNP, on eachof the two relevant chromosomes for each of the two parents of thetarget. Note that in one embodiment, the parental context does not referto the allelic state of the target, rather, it refers to the allelicstate of the parents. The parental context for a given SNP may consistof four base pairs, two paternal and two maternal; they may be the sameor different from one another. In this disclosure, it may be written as“m₁m₂|f₁f₂”, where m₁ and m₂ are the genetic state of the given SNP onthe two maternal chromosomes, and f₁ and f₂ are the genetic state of thegiven SNP on the two paternal chromosomes. In some embodiments, theparental context may be written as “f₁f₂|m₁m₂”. Note that subscripts “1”and “2” refer to the genotype, at the given allele, of the first andsecond chromosome; also note that the choice of which chromosome islabeled “1” and which is labeled “2” is arbitrary.

Note that in this disclosure, A and B are often used to genericallyrepresent base pair identities; A or B could equally well represent C(cytosine), G (guanine), A (adenine) or T (thymine). For example, if, ata given allele, the mother's genotype was T on one chromosome, and G onthe homologous chromosome, and the father's genotype at that allele is Gon both of the homologous chromosomes, one may say that the targetindividual's allele has the parental context of AB|BB; in some contexts,it may be equally correct to say that the target individual's allele hasthe parental context of AB|AA, or BA|AA. Note that, in theory, any ofthe four possible alleles could occur at a given allele, and thus it ispossible, for example, for the mother to have a genotype of AT, and thefather to have a genotype of GC at a given allele. However, empiricaldata indicate that in most cases only two of the four possible basepairs are observed at a given allele. In this disclosure the discussionassumes that only two possible base pairs will be observed at a givenallele, although the embodiments disclosed herein could be modified totake into account the cases where this assumption does not hold.

A “parental context” may refer to a set or subset of target SNPs thathave the same parental context. For example, if one were to measure 1000alleles on a given chromosome on a target individual, then the contextAA|BB could refer to the set of all alleles in the group of 1,000alleles where the genotype of the mother of the target was homozygous atthe SNP, and the genotype of the father of the target is homozygous, butwhere the maternal genotype and the paternal genotype are dissimilar atthat locus. If the parental data is not phased, and thus AB=BA, thenthere are nine possible parental contexts: AA|AA, AA|AB, AA|BB, AB|AA,AB|AB, AB|BB, BB|AA, BB|AB, and BB|BB. If the parental data is phased,and thus AB≠BA, then there are sixteen different possible parentalcontexts: AA|AA, AA|AB, AA|BA, AA|BB, AB|AA, AB|AB, AB|BA, AB|BB, BA|AA,BA|AB, BA|BA, BA|BB, BB|AA, BB|AB, BB|BA, and BB|BB. It is also possiblefor the genetic data from one parent to be phased, while the geneticdata from the other parent to be unphased, in which case there would betwelve parental contexts. Every SNP allele on a chromosome, excludingsome SNPs on the sex chromosomes, has one of these parental contexts.Note that some of these contexts may behave the same was other contexts,and one could lump those context together; this could be functionallyequivalent to using the full number of contexts. Alternately, one couldchoose to ignore certain contexts for the purposes of analysis.

Once the parental contexts have been determined, the SNPs from eachparental context may be grouped together, such that the SNP measurementsfrom the target genetic sample may be treated statistically, as a group,and compared with expected behavior for various hypotheses. Grouping theSNPs by context simply refers to creating subsets of SNPs that aredifferentiated by parental context, where each subset may be treated ina bulk manner. Grouping the SNPs is beneficial because the expected bulkbehavior of a set of SNPs depends its parental context.

The concept of parental contexts may be useful in the context of copyploidy determination. When genotyped, the SNPs within a first parentalcontext may be expected to statistically behave similarly when measuredfor a given ploidy state. In contrast, some sets of SNPs from a secondparental context may be expected to statistically behave differentlyfrom those in the first parental context in certain circumstances, suchas for certain ploidy states, and the difference in behavior may becharacteristic of one or a set of particular ploidy states. There aremany statistical techniques that could be used to analyze the measuredresponses at the various loci within the various parental contexts.

Hypotheses

A hypothesis may refer to a possible genetic state. It may refer to apossible ploidy state. A set of hypotheses refers to a set of possiblegenetic states. In some embodiments, a set of hypotheses may be designedsuch that one hypothesis from the set will correspond to the actualgenetic state of any given individual. In some embodiments, a set ofhypotheses may be designed such that every reasonably possible geneticstate may be described by at least one hypothesis from the set. In someembodiments of the present disclosure, one aspect of the method is todetermine which hypothesis corresponds to the actual genetic state ofthe individual in question.

In an embodiment of the present disclosure, one step involves creating ahypothesis. In some embodiments it may be a copy number hypothesis. Insome embodiments it may involve a hypothesis concerning which segmentsof a chromosome from each of the related individuals correspondgenetically to which segments, if any, of the other related individuals.Creating a hypothesis may refer to the act of setting the limits of theparameters such that the entire set of possible genetic states that areunder consideration are encompassed by those parameters. Creating ahypothesis may refer to the act of setting the limits of the parameterssuch that a limited set of possible genetic states that are underconsideration are encompassed by those parameters. Creating a set ofhypotheses may refer to estimating and/or describing the statisticallyexpected bounds of measured values that correspond to each of thehypotheses. Creating a set of hypotheses may refer to a knowledgeableperson listing those possible ploidy states that may be reasonablylikely under the circumstances. In one embodiment, it may refer toestimating the profile of SNP measurements of a target individual asmeasured on a high throughput SNP array for a set of parental contexts.

A ‘copy number hypothesis’, also called a ‘ploidy hypothesis’, or a‘ploidy state hypothesis’, may refer to a hypothesis concerning apossible ploidy state for a given chromosome, or section of achromosome, in the target individual. It may also refer to the ploidystate at more than one of the chromosomes in the individual. A set ofcopy number hypotheses may refer to a set of hypotheses where eachhypothesis corresponds to a different possible ploidy state in anindividual over one chromosome, or it may refer to a combination ofsingle-chromosome hypotheses over more than one chromosomes, where thenumber of different chromosomes could vary, in humans, from 2 to 23. Anormal individual contains one of each chromosome from each parent.However, due to errors in meiosis and mitosis, it is possible for anindividual to have 0, 1, 2, or more of a given chromosome from eachparent. In practice, it is rare to see more that two of a givenchromosomes from a parent. In this disclosure, the embodiments onlyconsider the possible hypotheses where 0, 1, or 2 copies of a givenchromosome come from a parent. In some embodiments, for a givenchromosome, there are nine possible hypotheses: the three possiblehypothesis concerning 0, 1, or 2 chromosomes of maternal origin,multiplied by the three possible hypotheses concerning 0, 1, or 2chromosomes of paternal origin. Let (m,f) refer to the hypothesis wherem is the number of a given chromosome inherited from the mother, and fis the number of a given chromosome inherited from the father.Therefore, the nine hypotheses are (0,0), (0,1), (0,2), (1,0), (1,1),(1,2), (2,0), (2,1), and (2,2), and these may also be written H00, H01,H02, H10, H11, H12, H20, H21, H22. The different hypotheses correspondto different ploidy states. For example, (1,1) refers to a normaldisomic chromosome; (2,1) refers to a maternal trisomy, and (0,1) refersto a monosomy. In some embodiments, the hypothesis may be written as(m,f_(x),f_(y)), to take into account the sex chromosome, where f_(x)refers to an X-chromosome or autosomal chromosome inherited from thefather, and f_(y) refers to a Y-chromosome inherited from the father.When this notation is used for autosomal chromosomes the f_(y) maysimply act as a placeholder. Thus a euploid embryo that is H101 atchromosome 23 would be a male, and if it were H110 at chromosome 23, itwould be a female. For example, H000 represents the nullsomy hypothesis;H100, H010 and H001 represent the monosomy hypotheses; H110 and H101represent the normal disomy hypotheses; H200, H020, H002, and H011represent uniparental disomy hypotheses; and H210, H120, and H111represent the trisomy hypotheses; and H220, H211, and H202 representsome of the possible tetrasomy hypotheses.

In some embodiments, the trisomy case, where two chromosomes areinherited from one parent and one chromosome is inherited from the otherparent may be further differentiated into two cases: one where the twochromosomes are identical (matched copy error), and one where the twochromosomes are homologous but not identical (unmatched copy error).

In some embodiments, where the parental data is phased, and thus eachallele may be specified as being part of either of two haplotypes, thereare sixteen possible hypotheses. In an embodiment where only one parentis phased, there may be twelve hypotheses. It is possible to use othersets of hypotheses. In an embodiment, some hypotheses that areconsidered to be unlikely may be discounted.

In some embodiments of the present disclosure, the ploidy hypothesis mayrefer to a hypothesis concerning which chromosome from other relatedindividuals correspond to a chromosome found in the target individual'sgenome. In some embodiments, the method uses the knowledge that relatedindividuals can be expected to share haplotype blocks, and usingmeasured genetic data from related individuals, along with a knowledgeof which haplotype blocks match between the target individual and therelated individual, it is possible to infer the correct genetic data fora target individual with higher confidence than using the targetindividual's genetic measurements alone. As such, in some embodiments,the ploidy hypothesis may concern not only the number of chromosomes,but also which chromosomes in related individuals are identical, ornearly identical, with one or more chromosomes in the target individual.

Once the set of hypotheses have been defined, when the algorithmsoperate on the input genetic data, they may output a determinedstatistical probability for each of the hypotheses under consideration.The probabilities of the various hypotheses may be determined bymathematically calculating, for each of the various hypotheses, thevalue of the probability, as stated by one or more of the experttechniques, algorithms, and/or methods described elsewhere in thisdisclosure, related disclosures, and/or encompassed by the PARENTALSUPPORT™ technique, using the relevant genetic data as input. Thecalculation may produce an exact value, it may give an estimate, it mayinclude an error term, it may include a confidence, and it may representa statistical likelihood.

Once the probabilities of the different hypotheses are calculated, asdetermined by a plurality of techniques, they may be combined. This mayentail, for each hypothesis, multiplying the probabilities as determinedby each technique. The product of the probabilities of the hypothesesmay be normalized. Note that one ploidy hypothesis refers to onepossible ploidy state for a chromosome.

In some embodiments, if the probability for a given hypothesis isgreater than the probability for any of the other hypotheses, then thathypothesis may be determined to be the most likely. In some embodiments,a hypothesis may be determined to be the most likely, and the ploidystate, or other genetic state, may be called if the normalizedprobability is greater than a threshold. In one embodiment, this maymean that the number and identity of the chromosomes that are associatedwith that hypothesis may be called as the ploidy state. In oneembodiment, this may mean that the identity of the alleles that areassociated with that hypothesis may be called as the allelic state,and/or the genetic state. In some embodiments, the threshold may bebetween about 50% and about 80%. In some embodiments the threshold maybe between about 80% and about 90%. In some embodiments the thresholdmay be between about 90% and about 95%. In some embodiments thethreshold may be between about 95% and about 99%. In some embodimentsthe threshold may be between about 99% and about 99.9%. In someembodiments the threshold may be above about 99.9%.

Parental Support

Some embodiments of the disclosed invention may be used in combinationwith the PARENTAL SUPPORT™ (PS) method, other embodiments of which aredescribed in three patent applications: Rabinowitz 2006, 2008 and 2009.In some embodiments, the methods disclosed herein may be considered aspart of the PARENTAL SUPPORT™ method. In some embodiments, The PARENTALSUPPORT™ method is a collection of methods that may be used to determinethe genetic data, with high accuracy, of one or a small number of cells,specifically to determine disease-related alleles, other alleles ofinterest, and/or the ploidy state of one or more chromosomes on thecell(s). PARENTAL SUPPORT™ may refer to any of these methods. PARENTALSUPPORT™ is an example of an informatics based method.

The PARENTAL SUPPORT™ method makes use of known parental genetic data,i.e. haplotypic and/or diploid genetic data of the mother and/or thefather, together with the knowledge of the mechanism of meiosis and theimperfect measurement of the target DNA, and possibly of one or morerelated individuals, in order to reconstruct, in silico, on a computer,the genotype at a plurality of alleles, and/or the ploidy state of anembryo or of any target cell(s), and the target DNA at the location ofkey loci with a high degree of confidence. The PARENTAL SUPPORT™ methodcan reconstruct not only single-nucleotide polymorphisms that weremeasured poorly, but also insertions and deletions, and SNPs or wholeregions of DNA that were not measured at all. Furthermore, the PARENTALSUPPORT™ method can both measure multiple disease-linked loci as well asscreen for aneuploidy, from a single cell, or from the same small amountof DNA. In some embodiments, the PARENTAL SUPPORT™ method may be used tocharacterize one or more cells from embryos biopsied during an IVF cycleto determine the genetic condition of the one or more cells. In someembodiments, the PARENTAL SUPPORT™ method may be used to determine theploidy state of a fetus from free floating fetal DNA and/or fetal cellsthat may be found in maternal blood, or from some other source.

In an embodiment, the PARENTAL SUPPORT™ method allows the cleaning ofnoisy genetic data. This may be done by inferring the correct geneticalleles in the target genome (embryo or fetus) using the genotype ofrelated individuals (parents) as a reference. PARENTAL SUPPORT™ may beparticularly relevant where only a small quantity of genetic material isavailable (e.g. PGD or NIPGD) and where direct measurements of thegenotypes are inherently noisy due to the limited amounts of geneticmaterial. The PARENTAL SUPPORT™ method is able to reconstruct highlyaccurate ordered diploid allele sequences on the embryo, together withcopy number of chromosomes segments, even though the conventional,unordered diploid measurements may be characterized by high rates ofallele dropouts, drop-ins, variable amplification biases and othererrors. The method may employ both an underlying genetic model and anunderlying model of measurement error. The genetic model may determineboth allele probabilities at each SNP and crossover probabilitiesbetween SNPs. Allele probabilities may be modeled at each SNP based ondata obtained from the parents and model crossover probabilities betweenSNPs based on data obtained from the HapMap database, as developed bythe International HapMap Project. Given the proper underlying geneticmodel and measurement error model, maximum a posteriori (MAP) estimationmay be used, with modifications for computationally efficiency, toestimate the correct, ordered allele values at each SNP in the embryo.

One aspect of the PARENTAL SUPPORT™ technology is a chromosome copynumber calling algorithm that in some embodiments uses parental genotypecontexts. To call the chromosome copy number, the algorithm may use thephenomenon of locus dropout (LDO) combined with distributions ofexpected embryonic genotypes. During whole genome amplification, LDOnecessarily occurs. LDO rate is concordant with the copy number of thegenetic material from which it is derived, i.e., fewer chromosome copiesresult in higher LDO, and vice versa. As such, it follows that loci withcertain contexts of parental genotypes behave in a characteristicfashion in the embryo, related to the probability of alleliccontributions to the embryo. For example, if both parents havehomozygous BB states, then the embryo should never have AB or AA states.In this case, measurements on the A detection channel are expected tohave a distribution determined by background noise and variousinterference signals, but no valid genotypes. Conversely, if bothparents have homozygous AA states, then the embryo should never have ABor BB states, and measurements on the A channel are expected to have themaximum intensity possible given the rate of LDO in a particular wholegenome amplification reaction. When the underlying copy number state ofthe embryo differs from disomy, loci corresponding to the specificparental contexts behave in a predictable fashion, based on theadditional allelic content that is contributed by, or is missing from,one of the parents. This allows the ploidy state at each chromosome, orchromosome segment, to be determined. The details of embodiments of thismethod are described elsewhere in this disclosure.

Platform Response

There are many methods that may be used to measure genetic data. None ofthe methods currently known in the art are able to measure the geneticdata with 100% accuracy; rather there are always errors, and/orstatistical bias, in the data. It may be expected that the method ofmeasurement will introduce certain statistically predictable biases intothe measurement. It may be expected that certain sets of DNA, amplifiedby certain methods, and measured with certain techniques may result inmeasurements that are qualitatively and quantitatively different fromother sets of DNA, that are amplified by other methods, and/or measuredwith different techniques. In some cases these errors may be due to themethod of measurement. In some cases this error may be due to the stateof the DNA. In some cases this bias may be due to the tendency of sometypes of DNA to respond differently to a given genetic measurementmethod. In some cases, the measurements may differ in ways thatcorrelate with the number of cells used. In some cases, the measurementsmay differ based on the measurement technique, for example, whichsequencing technique or array genotyping technique is used. In somecases different chromosomes may amplify to different extents. In somecases, certain alleles may be more or less likely to amplify. In somecases, the error, bias, or differential response may be due to acombination of factors. In many or all of these cases, the statisticalpredictability of these measurement differences, termed the ‘platformresponse’, may be used to correct for these factors, and can result indata for which the accuracy is maximized, and where each measurement isassociated with an appropriate confidence.

The platform response may be described as a mathematicalcharacterization of the input/output characteristics of a geneticmeasurement platform, such as TAQMAN, the AFFYMETRIX GENECHIP or theILLUMINA INFINIUM BEADARRAY. The platform response may be specific to aparticular platform, to a particular model of genotyping machine, to aparticular genotyping machine, or even to a particular scientist using aparticular genotyping machine. The input to the channel is the amplifiedgenetic material with any annealed, fluorescently tagged geneticmaterial. The channel output could be allele calls (qualitative) or rawnumerical measurements (quantitative), depending on the context. Forexample, in the case in which the platform's raw numeric output isreduced to qualitative genotype calls, the platform response may consistof an error transition matrix that describes the conditional probabilityof seeing a particular output genotype call given a particular truegenotype input. In one embodiment, in which the platform's output isleft as raw numeric measurements, the platform response may be aconditional probability density function that describes the probabilityof the numerical outputs given a particular true genotype input.

In some embodiments of the present disclosure, the knowledge of theplatform response may be used to statistically correct for the bias. Insome embodiments of the present disclosure, the knowledge of theplatform response may be used to increase the accuracy of the geneticdata. This may be done by performing a statistical operation on the datathat acts in the opposite manner as the biasing tendency of themeasuring process. It may involve attaching the appropriate confidenceto a given datum, such that when combined with other data, thehypothesis found to be most likely is indeed most likely to correspondto the actual genetic state of the individual in question.

In some embodiments of the present disclosure, a statistical method maybe used to remove the bias in the data due to the tendency for certainmaternal or paternal alleles to amplify in a disproportionate manner tothe other alleles. In some embodiments of the present disclosure, astatistical method may be used to remove the bias in the data due to thetendency for certain probes to amplify certain SNPs in a manner that isdisproportionate to other SNPs.

Imagine the two dimensional space where the x-coordinate is the xchannel intensity and the y-coordinate is the y channel intensity. Inthis space, one may expect that the context means should fall on theline defined by the means for contexts BB|BB and AA|AA. In some cases,it may be observed that the average contexts means do not fall on thisline, but are biased in a statistical manner; this may be termed “offline bias”. In some embodiments of the present disclosure, a statisticalmethod may be used to correct for the off line bias in the data.

In some cases splayed dots on the context means plot could be caused bytranslocation. If a translocation occurs, then one may expect to seeabnormalities on the endpoints of the chromosome only. Therefore, if thechromosome is broken up into segments, and the context mean plots ofeach segment are plotted, then those segments that lie on the of atranslocation may be expected to respond like a true trisomy ormonosomy, while the remaining segments look disomic. In some embodimentsof the present disclosure, a statistical method may be used to determineif translocation has occurred on a given chromosome by looking at thecontext means of different segments of the chromosome.

Ploidy Determination when Genetic Material of the Target Individual isContaminated

In an embodiment of the method, it is possible to determine the ploidystate of a fetus in a non-invasive manner by measuring fetal DNAcontained in maternal blood. Note that this may be complicatedconsiderably by the fact that the amount of fetal DNA available inmaternal blood may be small. The amount of fetal free floating DNA foundin serum is typically less than 50%, and often less than 20%, and thebackground maternal free floating DNA makes measurements on the fetalDNA very noisy and difficult to interpret. The number of fetal cells inmaternal blood is often less than 1 cell in 100,000, and can be as lowas 1 cell in a million, or lower. This method overcomes the difficultiesdescribed here, as well as other difficulties known in the art. Themethod may be applicable in cases where the amount of target DNA is inany proportion with the non-target DNA; for example, the target DNAcould make up anywhere between 0.01% and 99.99% of the DNA present.

The first step of the method is to make genomic measurements on themother and optionally the father, such that the diploid genetic data isknown at a large number of alleles for one or both parents. The numberof alleles may range from 100 to 100,000,000. In an embodiment, thenumber of alleles ranges from 500 to 100,000 per chromosome targeted. Inan embodiment, the number of alleles ranges from 1,000 to 20,000 perchromosome targeted. In an embodiment of the invention, the alleles areSNPs known to be polymorphic in the human population. Once the parentalgenotypes are known at a set of SNPs, the SNPs may be subdivided into anumber of sets of SNPs where each set corresponds to the set of SNPs ina particular parental context.

One may next determine the platform response of the system using thegenetic measurements of certain contexts. One also may determine theratio of target DNA to maternal DNA in the sample, using the geneticmeasurements of certain contexts. One also may also determine theobserved ADO given the observed genetic measurements.

The next step is to create a number of hypotheses, one for eachhypothetical ploidy state of interest on a chromosome of interest, anddetermine the expected statistical distribution of genotypicmeasurements for that hypothetical child, given expected ADO rates, andgiven the expected platform response. For example, at chromosome 21,several hypothetical child genotypes may be envisioned, for example, onefor a child that is disomic at chromosome 21 (H110), and a one for achild that has maternal trisomy at chromosome 21 (H210). Note that forautosomal chromosomes, (Hαβγ) denotes the hypothesis where a copies of amaternally derived chromosome are present, β copies of a paternallyderived chromosome are present, and γ is placeholder set to 0; in thecase of the sex chromosome, (Hαβγ) denotes the hypothesis where a copiesof a maternally derived chromosome are present, β indicates the numberof paternally derived X chromosomes that are present, and γ indicatesthe number of paternally derived Y chromosomes that are present.

Note that the hypothetical genotypes are not necessarily SNP-by-SNPgenotypes, rather they may be expected statistical distributions of SNPswithin a given parental context. For example, imagine looking only atthe parental context AA|AB, meaning the set of SNPs from the targetindividual where the mother is homozygous and the father isheterozygous. The H110 child is expected to have an equal chance of aSNP being AA or AB within that parental context, and thus, one wouldexpect to see, approximately, a 3:1 A:B ratio for the SNPs that are inthe AA|AB parental context. The H210 child is expected to have an equalchance of being AAA or AAB within that parental context, and thus, onewould expect to see approximately, a 5:1 A:B ratio for the SNPs that arein the AA|AB parental context. By observing the measured channelintensities for the various nucleotides it may be possible to determinewhich actual genetic state is most likely for that chromosome: disomy ortrisomy.

Below is described certain aspects of an embodiment of the invention inmore firm, mathematical terms. This section discusses how one can takethe parental genetic measurements, and the genetic measurements from themixed sample of fetal and maternal genetic material, in the form ofoutput from the genotyping platform, and transform those measurementsinto a copy number call.

Variable Definitions:

y=the average measured intensity from SNPs in a given context on aparticular chromosome, on a particular channel

x=the statistically expected number of allele copies present per locus,for the channel being measured, for SNPs in the context.

Δ=the fraction of fetal DNA in the sample

n=the fraction of SNPs that are A for a given genotype

v=a term denoting observational noise, which is a random variable withan unknown distribution.

One may state that x˜(1−Δ)n_(mother)+Δn_(fetus), and also that y=f(x)+v,that is, the distribution of the measurements of a set of SNPs within agiven parental context will be some function of the number of expectedalleles in the sample and the platform response, plus a noise factor.

In one embodiment of the invention, f(x) may be assumed to be a secondorder polynomial, that is, f(x)˜f₁x²+f₂x+f₃. In another embodiment, f(x)may be assumed to be a first order polynomial, that is, f(x)˜f₁x+f₂. Inanother embodiment, f(x) may be assumed to be an exponential equation,or other algebraic equation, or some combination thereof. Assume that vis Gaussian distributed with 0 mean, and a standard deviation=V.

It should be understood that f(x) could be assumed to be any number offunctions such as a first order polynomial, a third order polynomial,any other polynomial, any exponential, or any other algebraic or otherrelationship between x and y. It should also be understood that v couldbe any number of distributions, including a Gaussian, a Rayleighdistribution, a Pearson distribution, or a Bernoulli distribution.

At this point, x is known in terms of Δ and n, and f₁, f₂, f₃, Δ, andthe distribution of v, parameterized by V, is unknown. For a givensample, a genotypic measurement, y, is made of the sample for a numberof SNPs, for each context, for each channel, over a number ofchromosomes, including the chromosome(s) of interest, whose ploidy stateis to be determined, as well as at least one chromosome that may beexpected to be disomic. Each set of y's are then combined into a vector.Note that the set of chromosomes whose ploidy state is to be determinedand the at least one chromosome that may be expected to be disomic mayoverlap.

For example, in the human, there are a set of chromosomes that canresult in a live birth even when aneuploid, most commonly, chromosome13, 18, 21, X and Y. It is also known for live children to be born withaneuploidy at chromosomes 4, 5, 7, 8, 9, 11, 15, 16, 22. Note that otheraneuploidy states, such as translocations and uniparental disomy, at anychromosome may give rise to born children with chromosomalabnormalities. One of the chromosomes which is infrequently found to beaneuploid in gestating fetuses with a heartbeat, such as 1, 2, or 3 maybe used as a reference diploid chromosome. Alternately, one of thechromosomes that is targeted for aneuploidy testing may be used as areference, since it is unlikely that more than one gross chromosomalabnormality exists in a gestating fetus. In one embodiment of theinvention, the chromosomes targeted for aneuploidy detection include 13,18, 21, X and Y.

Given the measured y, or y_(m), for the chromosome that is expected tobe disomic, and given the expected number of A's measured in the sample,x_((H110)), one may then find a maximum likelihood estimate for f₁, f₂,f₃, v and Δ. The maximum likelihood estimate may be performed using anon-linear gradient descent method. Once f₁, f₂, f₃, v and Δ have beenestimated, distributions may be made for the predicted value of y,y_(p), for the various ploidy state hypotheses, for exampley_(predicted(H110)) and y_(predicted(H210)).

The observed y_(m) can be compared against the distributions for y_(p)and the likelihood of each hypothesis can be determined, which is theprobability of observing y_(m) according to the predicted model. Thehypothesis with the highest likelihood corresponds to the most likelyploidy state of the fetus. A confidence in the ploidy call may becalculated from the different likelihoods of the various hypotheses.

For a particular chromosome, assume that the likelihoods p(y_(m)|H110),p(y_(m)|H210) and p(y_(m)|H120) have been calculated. Also assume thatthe prior probability of each hypothesis is known from statisticalpopulation study. For example, p(H110) is the overall probability ofdisomy on this chromosome for the population of interest. Ifp(y_(m)|H110) is the highest likelihood, then the confidence on disomyis calculated using Bayes rule as confidence=p(y_(m)|H110)p(H110)/(p(y_(m)|H110)p(H110)+p(y_(m)|H210)p(H210)+p(y_(m)|H120)p(H120)).

Overview of the Method

In an embodiment, the present disclosure presents a method by which onemay determine the ploidy state of a gestating fetus, at one or morechromosome, in a non-invasive manner, using genetic informationdetermined from fetal DNA found in maternal blood, and genetic data fromthe mother and the father. The fetal DNA may be purified, partiallypurified, or not purified; genetic measurements may be made on DNA thatoriginated from more than one individual. Informatics type methods caninfer genetic information of the target individual, such as the ploidystate, from the bulk genotypic measurements at a set of alleles. The setof alleles may contain various subsets of alleles, wherein one or moresubsets may correspond to alleles that are found on the targetindividual but not found on the non-target individuals, and one or moreother subsets may correspond to alleles that are found on the non-targetindividual and are not found on the target individual. The set ofalleles may also contain subsets of alleles where the allele is found onthe target and the non-target in differing expected ratios. The methodmay involve comparing ratios of measured output intensities for varioussubsets of alleles to expected ratios given various potential ploidystates. The platform response may be determined, and a correction forthe bias of the system may be incorporated into the method. The ploidydetermination may be made with a computed confidence. The ploidydetermination may be linked to a clinical action. That clinical actionmay be to terminate or not terminate a pregnancy. An embodiment of theinvention involves the case where the target individual is a fetus, andthe non-target individual is the biological mother of the fetus.

In a basic explanation, the method works as follows. A simple version ofidea is to attempt to quantify the amount of fetal DNA at SNPs where thefetus has an allele that the mother does not. First, the genotypic dataof the parents are measured using a method that produces data for a setof SNPs. Then the SNPs are sorted into parental contexts. The SNPs foundin contexts where the mother is heterozygous, AB, are considered to beless informative, since the contaminating DNA in maternal blood willhave a large amount of both alleles. The SNPs found in contexts wherethe mother and father have the same set of alleles are also consideredto be less informative, since the background and the fetal signal arethe same. The simple method focuses on the contexts where the father hasan allele that the mother does not, for example: AA|AB and AA|BB (andBB|AB and BB|BB, though these are the same, by symmetry.) In the case ofthe AA|BB context, the fetus is expected to be AB, and therefore the Ballele should appear in fetal DNA. In the case of the AA|AB context, thefetus is expected to be AA half the time, and AB half the time, meaningthe B allele should appear in fetal DNA half the time.

Once the appropriate contexts have been selected, and the SNPs have beengrouped by parental context, for example, the mother AA contexts, thenthe appropriate SNPs are identified where the B SNP has been measured,indicating that the fetus is AB, along with the quantities of DNAmeasured for each of those SNPs. Now the intensities of the measurementsof the SNPs for a chromosome assumed to be disomic are compared to theintensities of the measurements of the SNPs for the chromosome ofinterest are compared, adjusted appropriately for platform response. Ifthe intensities of the SNPs for each of the two chromosomes are aboutequal, then the chromosome of interest is considered to be disomic, andif the intensities on the chromosome is about 50% greater than theintensities on the assumed disomic chromosome, then the chromosome ofinterest is considered to be paternal trisomic.

Note that this is a basic explanation of a simple version of the method.In an embodiment, some or all of the contexts may be used, includingthose of greater and lesser informativeness. In an embodiment, some orall of the SNPs may be used. For those contexts and SNPs that are moreinformative, for example, the SNPs in the AA|BB context, themeasurements may have greater weight in the overall calculation. Forthose contexts and SNPs that are less informative, for example, the SNPsin the AA|AA context, the measurements may have lesser weight in theoverall calculation. The explanation above focuses on measuring thenumber of paternal chromosomes. A similar method may be used todetermine the number of maternal chromosomes, with appropriateadjustments made. For example, the expected ratios of SNP intensitiesfor the disomy and trisomy hypothesis will be different, because thebackground maternal genotypic data and the fetal genotypic data will besimilar or identical. For example, in a case where the mixed samplecontains 20% fetal DNA and 80% maternal DNA, looking at the AA|BBcontext, for a disomy, one would expect a ratio of 90:10 for the A:Bmeasurements (80% A plus 20% 1:1 A:B), for a maternal trisomy one wouldexpect a ratio closer to 93.3:6.7 (80% A plus 20% 2:1 A:B), and for apaternal trisomy one would expect a ratio closer to 86.7:13.3 (80% Aplus 20% 1:2 A:B).

Note that this method may be used equally well with more or lessgenotypic information from the parents. For example, if the father'sgenotype is unknown, the method may consider all contexts where themother is homozygous (AA) to be more informative, and the chance of thefetus having a B SNPs may be calculated roughly from known SNPheterozygosities in the population. At the same time, if the father'sgenotype is phased, that is, the haplotypes are known, copy numberaccuracies may be increased, since there will be strong correlationsbetween expected contexts. For example, imagine three correlated SNPs ona chromosome where the contexts are AA|AB, AA|BA, AA|AB (the father isphased.) If the B allele is detected in maternal blood for the firstSNP, there is a much higher probability of detecting a B for the thirdallele, as opposed to the second allele, since a euploid fetus inheritsonly one haplotype from each parent. At the same time, if the mother'sgenotype is phased, accuracies are similarly increased, since there willbe more expected correlations between expected fetal contributions tothe relative SNP intensities.

Using each of the parent contexts, and chromosomes known to be euploid,it is possible to estimate, by a set of simultaneous equations, theamount of DNA in the maternal blood from the mother and the amount ofDNA in the maternal blood from the fetus. These simultaneous equationsare made possible by the knowledge of the alleles present on the mother,and optionally, the father. In an embodiment, the genetic data from boththe mother and the father is used. In particular, alleles present on thefather and not present on the mother provide a direct measurement offetal DNA. One may then look at the particular chromosomes of interest,such as chromosome 21, and see whether the measurements on thischromosome under each parental context are consistent with a particularhypothesis, such as H_(mp) where m represents the number of maternalchromosomes and p represents the number of paternal chromosomes e.g. H₁₁representing euploid, or H21 and H₁₂ representing maternal and paternaltrisomy respectively.

In some embodiments of the invention the method may be employed withknowledge of the maternal genotype, and without knowledge of thepaternal genotype. In this case, one could infer father contexts bylooking at the SNP data for those measurements on the mixed sample thatcannot be explained by mother data. One would begin identify the SNPswhere mother is homozygous (AA), and then look at the SNP data from themixed sample for B alleles. For those SNPs it is possible to infer thatthe father was AB or BB, and the fetus is AB. Likewise, for SNPs wherethe mother is AA, and no B was measured in the mixed sample, it ispossible to infer that the fetus is AA with a certain probability, wherethe probability is correlated to the ADO and LDO rates. It is alsopossible to use parental data with a certain degree of uncertaintyattached to the measurements. The methods described herein can beadapted to determine the ploidy state of the fetus given greater orlesser amounts of genetic information from the parents.

Some Assumptions

Note that these assumptions do not need to be true for this method tofunction as intended, rather they represent the idealized case for whichthis derivation is designed.

-   -   * The expected amount of genetic material in the maternal blood        from the mother is constant across all loci.    -   * The expected amount of genetic material present in the        maternal blood from the fetus is constant across all loci        assuming the chromosomes are euploid.    -   * The chromosomes that are non-viable (excluding 13, 18, 21,        X, Y) are all euploid in the fetus. In one embodiment, only some        of the non-viable chromosomes on the fetus need be euploid.

General Problem Formulation

One may write y_(ijk)=g_(ijk)(x_(ijk))+v_(ijk) where x_(ijk) is thequantity of DNA on the allele k=1 or 2 (1 represents allele A and 2represents allele B), j=1 . . . 23 denotes chromosome number and i=1 . .. N denotes the locus number on the chromosome, g_(ijk) is platformresponse for particular locus and allele ijk, and v_(ijk) is independentnoise on the measurement for that locus and allele. The amount ofgenetic material is given by x_(ijk)=am_(ijk)+Δc_(ijk) where a is theamplification factor (or net effect of leakage, diffusion, amplificationetc.) of the genetic material present on each of the maternalchromosomes, m_(ijk) (either 0,1,2) is the copy number of the particularallele on the maternal chromosomes, Δ is the amplification factor of thegenetic material present on each of the child chromosomes, and c_(ijk)is the copy number (either 0,1,2,3) of the particular allele on thechild chromosomes. Note that for the first simplified explanation, a andΔ are assumed to be independent of locus and allele i.e. independent ofi, j, and k. Thus it can be stated:

y _(ijk) =g _(ijk)(am _(ijk) +Δc _(ijk))+v _(ijk)

Approach Using an Affine Model That is Uniform Across All Loci

One may model g with an affine model, and for simplicity assume that themodel is the same for each locus and allele, although it will be obviousafter reading this disclosure how to modify the approach when the affinemodel is dependent on i,j,k. Assume the platform response model is

g _(ijk)(x _(ijk))=b+am _(ijk) +Δc _(ijk)

where the amplification factors a and Δ are used without loss ofgenerality, and a y-axis intercept b is added which defines the noiselevel when there is no genetic material. The goal is to estimate a andΔ. It is also possible to estimate b independently, but in this section,the noise level is assumed to be roughly constant across loci, and onlythe set of equations based on parent contexts are used to estimate a andA. The measurement at each locus is given by

y _(ijk) =b+am _(ijk) +Δc _(ijk) +v _(ijk)

Assuming that the noise v_(ijk) is independent and identicallydistributed (i.i.d.) for each of the measurements within a particularparent context, T, one can sum the signals within that parent context.The parent contexts are represented in terms of alleles A and B, wherethe first two alleles represent the mother and the second two allelesrepresent the father: T∈{AA|BB, BB|AA, AB|AB, AA|AA, BB|BB, AA|AB,AB|AA, AB|BB, BB|AB}. For each context T, there is a set of loci i,jwhere the parent DNA conforms to that context, represented i,j∈T. Hence:

$y_{T,k} = {{\frac{1}{N_{T}}{\sum\limits_{i,{j\; \varepsilon \; T}}y_{i,j,k}}} = {b + {a\; \overset{\_}{m_{k,T}}} + {\Delta \; \overset{\_}{c_{k,T}}} + \overset{\_}{v_{k,T}}}}$

Where m_(k,T) , c_(k,T) v_(k,T) represent the means of the respectivevalues over all the loci conforming to the parent context T, or over alli, j∈T. The mean or expected values c_(k,T) will depend on the ploidystatus of the child. The table below describes the mean or expectedvalues m_(k,T) and c_(k,T) for k=1(allele A) or 2(allele B) and all theparent contexts T. The expected values are calculated assuming differenthypotheses on the child, for example: euploidy and maternal trisomy. Thehypotheses are denoted by the notation H_(mf), where m refers to thenumber of chromosomes from the mother and f refers to the number ofchromosomes from the father e.g. H₁₁ is euploid, H₂₁ is maternaltrisomy. Note that there is symmetry between some of the states byswitching A and B, but all states are included for clarity:

Context AA/BB BB/AA AB/AB AA/AA BB/BB AA/AB AB/AA AB/BB BB/AB m_(A,T) 20 1 2 0 2 1 1 0 m_(B,T) 0 2 1 0 2 0 1 1 2 c_(A,T) |H₁₁ 1 1 1 2 0 1.5 1.50.5 0.5 c_(B,T) |H₂₁ 1 1 1 0 2 0.5 0.5 1.5 1.5 c_(A,T) |H₂₁ 2 1 1.5 3 02.5 2 1 0.5 c_(B,T) |H₂₁ 1 2 1.5 0 3 0.5 1 2 2.5This describes a set of equations describing all the expected valuesy_(T,k), which may be cast in matrix form, as follows:

Y=B+A _(H) P+v

Where

-   -   Y=[y_(AA|BB,1) y_(BB|AA,1) y_(AB|BB,1) y_(AA|AA,1) y_(BB|BB,1)        y_(AA|AB,1) y_(AA|AB,1) y_(AB|BB,1) y_(BB|AB,1) y_(AA|BB,2)        y_(BB|AA,2) y_(AB|AB,2) y_(AA|AA,2) y_(BB|BB,2) y_(AA|AB,2)        y_(AB|AA,2) y_(AB|BB,2) y_(BB|AB,2)]^(T)    -   P=[_(Δ) ^(α)] is the matrix of parameters to estimate    -   B=b{right arrow over (1)} were {right arrow over (1)} is the        18×1 matrix of ones.    -   v=[v_(A,AA|BB) . . . v_(B,BB|AB) ]^(T) is the 18×1 matrix of        noise terms        and A_(H) is the matrix encapsulating the data in the table,        where the values are different for each hypothesis H on the        ploidy state of the child. Below are examples of the Matrix        A_(H) for the ploidy hyopotheses H₁₁ and H₂₁

$A_{H_{11}} = {{\begin{bmatrix}2.0 & 1.0 \\0 & 1.0 \\1.0 & 1.0 \\2.0 & 2.0 \\0 & 0 \\2.0 & 1.5 \\1.0 & 1.5 \\1.0 & 0.5 \\0 & 0.5 \\0 & 1.0 \\2.0 & 1.0 \\1.0 & 1.0 \\0 & 0 \\2.0 & 2.0 \\0 & 0.5 \\1.0 & 0.5 \\1.0 & 1.5 \\2.0 & 1.5\end{bmatrix}\mspace{31mu} A_{H_{21}}} = \begin{bmatrix}2.0 & 2.0 \\0 & 1.0 \\1.0 & 1.5 \\2.0 & 3.0 \\0 & 0 \\2.0 & 2.5 \\1.0 & 2.0 \\1.0 & 1.0 \\0 & 0.5 \\0 & 1.0 \\2.0 & 2.0 \\1.0 & 1.5 \\0 & 0 \\2.0 & 3.0 \\0 & 0.5 \\1.0 & 1.0 \\1.0 & 2.0 \\2.0 & 2.5\end{bmatrix}}$

In order to estimate a and Δ, or matrix P, the data across allchromosomes that may be assumed to be euploidy on the child sample areaggregated. This would include some or all of the chromosomes j=1 . . .23 that have been measured, except those that are uncertain and thusunder test. In one embodiment, the uncertain chromosomes include j=13,18, 21, X and Y. In one embodiment, one could also apply a concordancetest for the results on the individual chromosomes in order to detectmosaic aneuploidy on the non-viable chromosomes. In order to clarifynotation, define Y′ as Y measured over all the euploid chromosomes, andY″ as Y measured over a particular chromosome under test, such aschromosome 21, which may be aneuploid. Apply the matrix A_(H) ₁₁ to theeuploid data in order to estimate the parameters:

{circumflex over (P)}=argmin_(p) ∥Y′−B−A _(H) ₁₁ P∥ ₂=(A _(H) ₁₁^(T)A_(H) ₁₁ )⁻¹ A _(H) ₁₁ ^(T) {tilde over (Y)}

where {tilde over (Y)}=Y′−B i.e. the measured data with the biasremoved. The least-squares solution above is only the maximum-likelihoodsolution if each of the terms in the noise matrix v has a similarvariance. In some cases, this is not the case, most simply because thenumber of loci N′_(T) used to compute the mean measurement for eachcontext T may be different for each context. As above, N_(T)′ refers tothe number of loci used on the chromosomes known to be euploid, and C′denotes the covariance matrix for mean measurements on the chromosomesknown to be euploid. There are many approaches to estimating thecovariance C′ of the noise matrix v, which may be assumed to bedistributed as v˜N(0,C′). Given the covariance matrix, themaximum-likelihood estimate of P is

{circumflex over (P)}=argmin_(p) ∥C′ ^(−1/2)(Y′−B−A _(H) ₁₁ P)∥₂=(A _(H)₁₁ ^(T) C′ ⁻¹ A _(H) ₁₁ )⁻¹ A _(H) ₁₁ ^(T) C′−1 {tilde over (Y)}

One simple approach to estimating the covariance matrix is to assumethat all the terms of v are independent (i.e. no off-diagonal terms) andinvoke the Central Limit Theorem so that the variance of each term of vscales as 1/N′_(T) and then find the 18×18 matrix

$C^{\prime}\begin{bmatrix}{1/N_{{AA}{BB}}^{\prime}} & \ldots & 0 \\\vdots & \ddots & \vdots \\0 & \ldots & {1/N_{{BB}{AB}}^{\prime}}\end{bmatrix}$

Once P′ has been estimated, these parameters are used to determine themost likely hypothesis on the chromosome under study, such as chromosome21. In other words, the following hypothesis may be chosen:

H*=argmin_(H) ∥C″ ^(−1/2)(Y″−B−A _(H) {circumflex over (P)})∥₂

Having found H* one can then estimate the degree of confidence in thedetermination of H*. Assume, for example, that there are two hypothesesunder consideration: H₁₁ (euploid) and H₂₁ (maternal trisomy). Assumethat H*=H₁₁. The distance measures corresponding to each of thehypotheses may be computed as follows:

d ₁₁ =∥C″ ^(−1/2)(Y″−B−A _(H) ₁₁ {circumflex over (P)})∥₂

d ₂₁ =∥C″ ^(−1/2)(Y″−B−A _(H) ₂₁ {circumflex over (P)})∥₂

It can be shown that the square of these distance measures are roughlydistributed as a Chi-Squared random variable with 18 degrees of freedom.Let χ₁₈ represent the corresponding probability density function forsuch a variable. One may then find the ratio in the probabilities p_(H)of each of the hypotheses according to:

$\frac{p_{H_{11}}}{p_{H_{21}}} = \frac{\chi_{18{(d_{11}^{2})}}}{\chi_{18{(d_{21}^{2})}}}$

The probabilities of each hypothesis may be calculated by adding theequation p_(H) ₁₁ +p_(H) ₂₁ =1. The confidence that the chromosome is infact euploid is given by p_(H) ₁₁

In some embodiments, it is possible to modify the above approach fordifferent biases b on each of the channels representing alleles A and B.The bias matrix B is redefined as follows:

$B = \begin{bmatrix}{b_{A}\overset{\rightarrow}{1}} \\{b_{B}\overset{\rightarrow}{1}}\end{bmatrix}$

where {right arrow over (1)} is a 9×1 matrix of ones. As discussedabove, the parameters b_(A) and b_(B) can either be assumed based ona-priori measurements, or can be included in the matrix P and activelyestimated (i.e. there is sufficient rank in the equations over all thecontexts to do so).

In one embodiment, in the general formulation, wherey_(ijk)=g_(ijk)(am_(ijk)+Δc_(ijk))+v_(ijk), one can directly measure orcalibrate the function g_(ijk) for every locus and allele, so that thefunction (which is monotonic for the vast majority of genotypingplatforms) can be inverted. One can then use the function inverse torecast the measurements in terms of the quantity of genetic material sothat the system of equations is linear i.e y′_(ijk)=g_(ijk)⁻¹(y_(ijk))=am_(ijk)+Δc_(ijk)+v′_(ijk). This approach is particularlygood when g_(ijk) is an affine function so that the inversion does notproduce amplification or biasing of the noise in v′_(ijk).

In some embodiments, the modified noise term v′_(ijk)=g_(ijk)⁻¹(v_(ijk)) may be amplified or biased by the function inversion.Another embodiment which may be more optimal from a noise perspective isto linearize the measurements around an operating point i.e.:

y _(ijk) =g _(ijk)(am _(ijk) +Δc _(ijk))+v _(ijk)

may be recast as:

y _(ijk) ≈g _(ijk)(am _(ijk))+g _(ijk)′(am _(ijk))Δc _(ijk) +v _(ijk)

in the case where the fraction of free-floating DNA in the maternalblood from the child is small, Δ<<a, and the expansion is a reasonableapproximation. Alternatively, for a platform response such as that ofthe ILLUMINA BEADARRAY, which is monotonically increasing and for whichthe second derivative is typically negative, one can improve thelinearization estimate according toy_(ijk)≈g_(ijk)(am_(ijk))+0.5(g_(ijk)′(am_(ijk))+g_(ijk)′(am_(ijk)+Δc_(ijk)))Δc_(ijk)+v_(ijk).The resulting set of equations may be solved iteratively for a and usinga method such as Newton-Raphson optimization.

In some embodiments, one may measure at the total amount of DNA on thetest chromosome (mother plus fetus) and compare with the amount of DNAon all other chromosomes, based on the assumption that amount of DNAshould be constant across all chromosomes. In order to estimateconfidence bounds meaningfully, one may look at standard deviationacross other chromosome signals that should be euploid to estimate thesignal variance and generate a confidence bound. In order to calibrateout the amplification biases amongst different chromosomes, one may finda regression function linking each chromosome's mean signal level toevery other chromosomes mean signal level, combine the signal from allchromosome by weighting based on variance of the regression fit, andlook to see whether the test chromosome of interest is within theacceptable range as defined by the other chromosomes.

In some embodiments, this method may be used in conjunction with othermethods previously disclosed by Gene Security Network, especially thosemethods that are part of PARENTAL SUPPORT™, and are mentioned elsewherein this disclosure, such that one may phase the parents so that it isknown what is contained on each individual maternal and paternalchromosome. By considering the odds ratio of each of the alleles atheterozygous loci, one may determine which haplotype of the mother ispresent on the child. Then one can compare the signal level of themeasurable maternal haplotype to the paternal haplotype that is present(without background noise from the mother) and see when that ratio of1:1 is not satisfied due to aneuploidy which causes an imbalance betweenmaternal and paternal alleles.

This list of possible variations on the method is not meant to beexhaustive. Other variation may also be employed. Note that in thisdisclosure, for the purposes of calculation, certain assumptions mayhave been made about parameters, characteristics of the data, variables,etc. In these cases, other assumptions may be made that do not changethe essence of the invention.

Modeling

In one embodiment, the raw data may be produced by a microarray whichmeasures the response from each possible allele on a selection of SNPs.In an embodiment, the microarray may be an ILLUMINA SNP microarray, oran AFFYMETRIX SNP microarray. In other embodiments other sources of datamay also be used, such as a sufficiently large number of TAQMAN probesor a non-SNP based array. The raw genetic data may from other sources aswell, such as DNA sequencing.

In this embodiment, a SNP is typically expected to be one of twonucleotides. For example, it may be expected to be either a G or C, andmay be measured for the G or C response; alternately, at a SNP whichcould have A or T it may be measured for the A and T response. Sinceonly two alleles are possible at each SNP, the measurements may beaggregated without regard for whether the SNP is A/T or C/G. Instead,this disclosure may refer to responses on the x and y channels, andgeneric alleles A or B. Thus the possible genotypes in this example areAA, AB and BB for all SNPs. There are other ways of grouping the allelecalls that will not affect the essence of the invention.

Measurements may be initially aggregated over SNPs from the same parentcontext based on unordered parent genotypes. Each context may be definedby the number of A and number of B alleles from each parent: [a_(m)b_(m) a_(f) b_(f)] where a_(m)+b_(m)=2 and a_(f)+b_(f)=2. For example,all SNPs where the mother's genotype is AA and the father's genotype isBB are members of the AA|BB context. The combination of 3 possiblegenotypes over 2 parents means that the measurements from a singlechromosome would consist of 18 context means, 9 on each channel.Consider a copy number hypothesis for the child of the form (n_(m),n_(f)) where n_(m) is the number of mother copies and n_(f) is thenumber of father copies of the chromosome. Let the expected number of As(averaged over SNPs) in the child be k_(x) and the expected number of Bsbe k_(y) (for a particular context, conditioned on a hypothesis). Theexpected number of alleles depends on the context and the hypothesis.

k _(x)=0.5a _(m) n _(m)+0.5a _(f) n _(f)

k _(y)=0.5b _(m) n _(m)+0.5b _(f) n _(f)  (1)

The amount of DNA measured at a SNP will depend on the number of allelespresent at that SNP in the maternal and fetal chromosomes, and theoverall concentrations of DNA present in the sample from the mother andfetus. The factor a reflects the overall concentration of DNA in thesample, and the ratio of mother to child is 1 to S.

For SNPs in contexts where the parents are homozygous, the genotypes ofa disomic child is known. For example, if one parent's genotype is AAand the other's is BB, the child genotype must be AB. In contrast, SNPswhere a parent is heterozygous will have unknown child genotype.Consider the context AB|BB, where the child may inherit either an A or aB from the mother. The most general assumption is that the child willinherit the A and the B with equal probability, and so approximatelyhalf of the child genotypes in this context will be AB and half will beBB. Other assumptions may be made regarding the likelihood of a childinheriting a given allele from a given parent. Although the genotype ofeach child SNP is not known, the average values of k_(x) and k_(y) arethus known for SNPs in each context, and so the equations below refersto these averages.

In the example where the parent context is AB|BB, the average number ofAs in the child SNPs is 0.5 and the average number of Bs is 1.5. Thequantities x_(x) and x_(y) refer to the average amount of DNA presentfor SNPs in a particular context, where x_(x) is the DNA that will bemeasured on the x channel (allele A) and x_(y) is the DNA that will bemeasured on the y channel (allele B).

x _(x)=α(m _(x) +δ k _(x))

x _(y)=α(m _(y) +δ k _(y))  (2)

The quantity of DNA may be measured through the platform responses onthe x and y channels. SNPs in the same context may be aggregated toproduce measurements y_(x), y_(y) which are the context mean responseson the x and y channels. Assume that SNPs are i.i.d.

Extensive analysis (for the whole chromosome mean algorithm, as part ofPARENTAL SUPPORT™) has found systematic differences in amplificationbetween chromosomes. Let y_(c) and y₁ be the means from the same contextand same sample, from chromosome c and chromosome 1 respectively. Theexpected value of y_(c)/y₁ is defined as β_(c) and may be calculatedfrom a large set of training data. The training data consists ofhundreds of blastomeres which have been analyzed under a consistentlaboratory protocol. The chromosome weights β depend on microarray type(because different arrays measure different SNPs) and the type of lysisbuffer used, but otherwise may be consistent between samples. Therefore,the expected number of As or Bs may be weighted by β to account for thiseffect, resulting in a chromosome-weighted number of alleles {circumflexover (m)} or {circumflex over (k)}.

{circumflex over (m)}_(xc)=m_(xc)β_(xc)

{circumflex over (m)}_(yc)=m_(yc)β_(yc)

{circumflex over (k)}_(xc)=k_(xc)β_(xc)

{circumflex over (k)}_(yc)=k_(yc)β_(yc)

By accounting for chromosome variation using a weighted number ofalleles, the platform response model f_(x)(x_(x)), f_(y)(x_(y)) may beconsidered consistent across chromosomes. However, the bias b may beobserved to vary by chromosome and channel and the measurement noise vwill vary on each measurement. The bias of a particular chromosome andchannel is the mean of the noise-only context, and is therefore a known(directly measured) quantity. The noise-only contexts are AA|AA for they channel and BB|BB for the x-channel, because in these cases theexpected number of the measured allele is zero. Thus, the measurementgives a baseline for the platform response in the absence of the signalwhich it measures. The scalar noise covariance associated with eachcontext mean measurement may be assumed to be proportional to 1 n wheren is the number of SNPs included. This corresponds to the assumption ofi.i.d. SNPs within each context. The noise components may be assumedindependent and normally distributed.

y _(x) =f _(x)(x _(x))+b _(x) +v _(x)

y _(y) =f _(y)(x _(y))+b _(y) +v _(y)

Quadratic Platform Response

In one embodiment, a linear platform response model (affine relationshipbetween amount of DNA and measured signal) may be used. In anotherembodiment, a quadratic platform response f(x)=f₁x²+f₂x may be used. Insome embodiments, a quadratic platform response may be used where f₁ andf₂ are specific to each sample and measurement channel and x is thequantity of DNA. Other platform response models may be employed,including higher order algorithmic or exponential relationships.Substituting from (2) for the quantity of DNA results in the followingmodel for the x and y channel responses on chromosome c from context i.

y _(xci) =f _(1x)α² {circumflex over (m)} ² _(xci) +f _(1x)α²δ²{circumflex over (k)} ² _(xci)+2f _(1x)α²δ² {circumflex over (m)} _(xci){circumflex over (k)} _(xci) +f _(2x) α{circumflex over (m)} _(xci) +f_(2x) αδ{circumflex over (k)} _(xci) +b _(xc) +v _(xci)

y _(yci) =f _(1y)α² {circumflex over (m)} ² _(yci) +f _(1y)α²δ²{circumflex over (k)} ² _(yci)+2f _(1y)α²δ² {circumflex over (m)} _(yci){circumflex over (k)} _(yci) +f _(2y) α{circumflex over (m)} _(yci) +f_(2y) αδ{circumflex over (k)} _(yci) +b _(yc) +v _(yci)  (3)

Without loss of generality, the DNA concentration a and platformresponses f_(1x), f_(2x), f_(1y), f_(2y) may be combined to form the setof 5 parameters for the sample. Note that when the model includes termsof the form p₁x δ², p₁xδ and p₂xδ, and so the parameter estimate cannotbe solved exactly using linear methods.

$\begin{matrix}\begin{matrix}{y_{xci} = {{p_{1x}{\hat{m}}_{xci}^{2}} + {p_{1x}\delta^{2}{\hat{k}}_{xci}^{2}} + {2p_{1x}\delta \; {\hat{m}}_{xci}{\hat{k}}_{xci}} + {p_{2x}{\hat{m}}_{xci}} + {p_{2x}\delta \; {\hat{k}}_{xci}} +}} \\{{b_{xc} + v_{xci}}} \\{= {{g_{xci}(p)} + b_{xc} + v_{xci}}}\end{matrix} & (5) \\\begin{matrix}{y_{yci} = {{p_{1y}{\hat{m}}_{yci}^{2}} + {p_{1y}\delta^{2}{\hat{k}}_{yci}^{2}} + {2p_{1y}\delta \; {\hat{m}}_{yci}{\hat{k}}_{yci}} + {f_{2y}{\hat{m}}_{yci}} + {f_{2y}\delta \; {\hat{k}}_{yci}} +}} \\{{b_{yc} + v_{yci}}} \\{= {{g_{yci}(p)} + b_{yc} + v_{yci}}}\end{matrix} & \; \\{\mspace{79mu} {p = {\begin{bmatrix}p_{1x} \\p_{2x} \\p_{1y} \\p_{2y} \\\delta\end{bmatrix} = \begin{bmatrix}{f_{1x}\alpha^{2}} \\{f_{2x}\alpha^{2}} \\{f_{1y}\alpha^{2}} \\{f_{2y}\alpha^{2}} \\\delta\end{bmatrix}}}} & \;\end{matrix}$

In this description, this set of parameters p may be assumed to becommon to all chromosomes and parent genotype contexts for a singlesample, and so the model for a single chromosome c and context i can bewritten in the following condensed form based on the non-linear platformresponse function g.

$\begin{bmatrix}y_{xci} \\y_{yci}\end{bmatrix} = {\begin{bmatrix}{g_{xci}(p)} \\{g_{yci}(p)}\end{bmatrix} + \begin{bmatrix}b_{xc} \\b_{yc}\end{bmatrix} + \begin{bmatrix}v_{xci} \\v_{yci}\end{bmatrix}}$ y_(ci) = g_(ci)(p) + b_(c) + v_(ci)

The set of N measurements from a sample can be combined to form a vectorequation in p.

$\begin{bmatrix}y_{1} \\\vdots \\y_{N}\end{bmatrix} = {\begin{bmatrix}{g_{1}(p)} \\\vdots \\{g_{N}(p)}\end{bmatrix} + \begin{bmatrix}b_{1} \\\vdots \\b_{N}\end{bmatrix} + \begin{bmatrix}v_{1} \\\vdots \\v_{N}\end{bmatrix}}$ y = g(p) + b + v

In other embodiments, the parameters may be different for differentchromosomes, or for different samples.

Linearized Quadratic Platform Response

Consider the linearization of the quadratic platform response at x=x₀:

f(x)≈f₁x² ₀ +f ₂ x ₀+(2f ₁x₀+f₂)(x−x ₀)

Substitution of the mother's contribution α{circumflex over (m)} for thenominal DNA quantity x₀ results in the following model.

y _(xci) =f _(1x)α² {circumflex over (m)} ² _(xci)+2f _(1x)α²δ{circumflex over (m)} _(xci) {circumflex over (k)} _(xci) +f_(2x)α{circumflex over (m)}_(xci) +f _(2x)αδ{circumflex over (k)}_(xci)

y _(yci) =f _(1y)α² {circumflex over (m)} ² _(yci)+2f _(1y)α²δ{circumflex over (m)} _(yci) {circumflex over (k)} _(yci) +f _(2y)α{circumflex over (m)} _(yci) +f _(2y) αδ{circumflex over (k)} _(yci)

Although the platform response is linearized, the model is stillnon-linear in the set of unknown model parameters defined in (5). In oneembodiment, a linear estimation method can be implemented byconstructing an augmented parameter set which eliminates the non-linearterms by adding extra degrees of freedom. This augmented parameter sethas 8 degrees of freedom. In another embodiment, it is possible toattempt this type of linear solution for the full quadratic model. Thefour parameters for the X channel are shown, and those for the Y channelare defined similarly.

$\begin{matrix}{q_{x} = {\begin{bmatrix}q_{1x} \\q_{2x} \\q_{3x} \\q_{4x}\end{bmatrix} = \begin{bmatrix}{f_{1x}\alpha^{2}} \\{f_{2x}\alpha} \\{f_{1x}\alpha^{2}\delta} \\{f_{2x}\alpha \; \delta}\end{bmatrix}}} & (6)\end{matrix}$

Using this set of parameters, the linearized model for a chromosome cand context i can be written in matrix form.

$A_{xci} = \begin{bmatrix}{\hat{m}}_{xci}^{2} & {\hat{m}}_{xci} & {2{\hat{m}}_{xci}{\hat{k}}_{xci}} & {\hat{k}}_{xci}\end{bmatrix}$ $A_{yci} = \begin{bmatrix}{\hat{m}}_{yci}^{2} & {\hat{m}}_{yci} & {2{\hat{m}}_{yci}{\hat{k}}_{yci}} & {\hat{k}}_{yci}\end{bmatrix}$ $A_{ci} = {{\begin{bmatrix}A_{xci} & 0 \\0 & A_{yci}\end{bmatrix}\begin{bmatrix}y_{xci} \\y_{yci}\end{bmatrix}} = {{A_{ci}\begin{bmatrix}q_{x} \\q_{y}\end{bmatrix}} + \begin{bmatrix}b_{xc} \\b_{yc}\end{bmatrix} + \begin{bmatrix}v_{xci} \\v_{yci}\end{bmatrix}}}$ y_(ci) = A_(ci)q + b_(ci) + v_(ci)

The measurements from all chromosomes, contexts and channels may becombined into a single matrix equation with parameters q in R⁸ asfollows:

$\begin{matrix}{{\begin{bmatrix}y_{1} \\\vdots \\y_{N}\end{bmatrix} = {\begin{bmatrix}A_{1} \\\vdots \\A_{N}\end{bmatrix} + \begin{bmatrix}b_{1} \\\vdots \\b_{N}\end{bmatrix} + \begin{bmatrix}v_{1} \\\vdots \\v_{N}\end{bmatrix}}}{y = {{Aq} + b + v}}} & (7)\end{matrix}$

Recall that y are the context mean measurements, A is the set of knowncoefficients, q is the set of parameters to be estimated, b is the knownbias vector, and v is assumed zero-mean Gaussian noise.

Parameter Estimation

In one embodiment, the strategy for parameter estimation is to assume asubset of the child's chromosomes are disomic (having one copy from eachparent) and use these to learn the model parameters for the childsample. These sample model parameters are then used to classify theremaining chromosomes, determining how many copies are present from eachparent. Thus, the child allele contributions {circumflex over(m)}_(xci), {circumflex over (m)}_(yci) may be calculated from (1) atthe parameter estimation step under the assumption that the mother andfather copy number contributions n_(m) and n_(f) are both one. If D isthe number of assumed disomic chromosomes, then the measurement vector yfor parameter estimation has size 18D (from nine context means measuredon two channels).

Linearized Quadratic Sensor Model

The linearized quadratic model (7) leads to straightforwardleast-squares (LS) or maximum likelihood (ML) solutions for the bestestimate of q. The maximum likelihood solution depends on the number ofSNPs incorporated in each measurement, given in the diagonal matrix N.In an embodiment, the maximum likelihood solution is used because theinformativeness of the different measurement components varies widely,and the matrix N which determines this variation is known.

$\begin{matrix}{q^{*_{LS}} = {\arg \; \min \; {{y - \left( {{Aq} + b} \right)}}^{2}}} \\{= {\left( {A^{T}A} \right)^{- 1}{A^{T}\left( {y - b} \right)}}}\end{matrix}$ $\begin{matrix}{q^{*_{ML}} = {\arg \; \max \; {P\left( {y;q} \right)}}} \\{= {\arg \; \min \; {{N^{- 0.5}\left( {y - {Aq} - b} \right)}}^{2}}} \\{= {\left( {A^{T}N^{- 1}A} \right)^{- 1}A^{T}{N^{- 1}\left( {y - b} \right)}}}\end{matrix}$

Quadratic Sensor Model

The quadratic sensor model may not lead to closed form solutions for theparameter estimate p which best fits the measurements. In anotherembodiment, a gradient descent optimization method may be applied whichiteratively improves on an initial guess for p in order to minimize acost function. A non-linear least squares formulation for p minimizesthe mean square difference between the measured data and the valuespredicted by the model.

p*=argmin∥y−g(p)−b∥ ²

Commercial non-linear optimization functions, such as MATLAB's FMINCON,use iterative methods to find a local minimum of a user-provided costfunction by numerically approximating the function's gradient.

The parameter estimate q* based on the linearized model may provide aconvenient initial condition for the non-linear optimization because itsolves an approximation of the same problem but can be calculated inclosed form at little computational cost. Comparison of the linearized(q) and non-linear (p) parameters below shows that the mapping from p toq is not invertible.

${q = {\begin{bmatrix}q_{1x} \\q_{2x} \\q_{3x} \\q_{4x} \\q_{1y} \\q_{2y} \\q_{3y} \\q_{4y}\end{bmatrix} = \begin{bmatrix}{f_{1x}\alpha^{2}} \\{f_{2x}\alpha} \\{f_{1x}\alpha^{2}\delta} \\{f_{2x}\alpha \; \delta} \\{f_{1y}\alpha^{2}} \\{f_{2y}\alpha} \\{f_{1y}\alpha^{2}\delta} \\{f_{2y}\alpha \; \delta}\end{bmatrix}}},\mspace{31mu} {p = {\begin{bmatrix}p_{1x} \\p_{2x} \\p_{1y} \\p_{2y} \\\delta\end{bmatrix} = \begin{bmatrix}{f_{1x}\alpha^{2}} \\{f_{2x}\alpha^{2}} \\{f_{1y}\alpha^{2}} \\{f_{2y}\alpha^{2}} \\\delta\end{bmatrix}}}$

The mapping from p to q will be written q(p) and is as follows.

q(p)=[p _(1x) p _(2x) p_(1x)δ p_(2x)δ p_(1y) p_(2y) p_(1y)δ p_(2y)δ]^(T)

Given q=q*_(MLE), select p₀=argmin∥q−q(p)∥₂, which has a closed formpolynomial solution. Then p₀ may be used as an initial condition for aniterative solution of p*=arg min∥y−g(p)−b∥.

An estimate of the distribution of the noise vector v may be used in thecalculation of observation likelihoods. The fit error vector e=y−g(p*)−bis a sample from the distribution of v. Recall that the assumption ofi.i.d. SNPs implies that the context means will have varianceproportional to the included number of SNPs. Thus, the covariance V of vhas the form γN⁻¹ where γ is scalar and N is the diagonal matrixdefining the number of SNPs measured in each context mean. The matrix Nis known, and y is estimated as the variance of the components ofN^(0.5)e.

Copy Number Determination

After estimating the model parameters for a particular sample based on aset of known disomic chromosomes, the task is to estimate the copynumber for the chromosome of interest, or for the remainder of thechromosomes. Recall that a child copy number hypothesis has the formHn_(m)n_(f) where (n_(m), n_(f)) represent the number of copiescontributed by the mother and father, respectively. In an embodiment,the focus is placed on detection of trisomies, where one parentcontributes an extra copy, because these errors may result in a viablefetus, and conditions such as Down Syndrome. The copy number hypothesispredicts the expected number of child alleles present at a SNP with aparticular parent context, according to (1). For example, consider thecontext AA|BB where the mother has genotype AA and the father hasgenotype BB. Under the disomy hypothesis H11, the child's genotype willbe AB, but under the maternal trisomy hypothesis H21 the child'sgenotypes will be AAB, and a higher signal on the x channel can bedetected due to the extra A. The number of child alleles present appearsin the matrix A in the linearized model and in the function g(p) in thequadratic model, and depends on the assumed hypothesis in this manner.Thus, the assumption of a particular copy number hypothesis h results ina corresponding model A^(h)q or g^(h)(p). The various hypotheses will beevaluated by considering the likelihood of the observed data under thedifferent models.

Consider the measurement vector y_(c)∈R¹⁸ from a chromosome c. Recallthat y_(c) contains the 18 context mean measurements from thechromosome, where each is an average of the measurements from SNPs in aparent genotype context. Substitution of a hypothesis into the learnedmodel results in a distribution p(y_(c)|h) which is implicitly dependenton the learned model parameters. The probabilities of the varioushypotheses h can be solved for from the likelihoods {p(y_(c)|h} byincorporating priors using Bayes rule. Classification is possible whenthe distributions p(y_(c)|h_(i)) and p(y_(c)|h_(j)) are distinguishablefor different hypotheses h_(i) and h_(j). For a single chromosome,define d_(ij) as the mean square difference in model output comparinghypotheses h_(i) and h_(j).

$d_{ij} = {\frac{1}{18}{\sum\limits_{i = 1}^{18}\left( {{g^{i}\left( p^{*} \right)} - {g^{i}\left( p^{*} \right)}} \right)^{2}}}$

A high-confidence call between hypotheses h_(i) and h_(j) can beexpected when d_(ij) is large compared to the sensor noise variance.

Estimation may be based on the quadratic sensor model, y=g(p)+b+v.Conditioned on a set of model parameters and a hypothesis, themeasurement vector y_(c) is normally distributed with mean g^(h)(p*)+band covariance V=γN⁻¹. By defining the error vector e_(c)^(h)=y_(c)−g^(h)(p*)−b for a hypothesis h and chromosome c, it ispossible to see that e_(c) ^(h) is normally distributed with zero meanand covariance V and e_(c) ^(hT)V⁻¹e_(c) ^(h) has the chi-squareddistribution with 18 degrees of freedom.

${p\left( {y_{v}h} \right)} = {p_{x_{18}^{2}}\left( {\left( {y_{c} - {g^{h}\left( p^{*} \right)} - b} \right)\frac{1}{\gamma}{N\left( {y_{c} - {g^{h}\left( p^{*} \right)} - b} \right)}} \right)}$

Copy Number Calling with Phased Paternal Genetic Data

In an embodiment of the invention, phased father genotype data may beused. In this section is described an embodiment that takes advantage ofthe phased parental data. This section discloses an extension of anembodiment described earlier; it is designed for the case where phasedfather genotypic data is available, and allows for more accurateparameter estimation and hypothesis fitting.

When the genotype data is phased, then the AB genotype can bedistinguished from the BA genotype. Therefore, in the AB genotype, thefirst haplotype has the A allele at a given locus, and the secondhaplotype has the B allele at the locus, whereas, in the BA genotype,the first haplotype has the B allele at the locus, and the secondhaplotype has the A allele at the locus. When genotype is unphased, orunordered no distinction is made between AB and BA, and it is typicallyreferred to as AB.

Phasing of father genotype may be done by various methods, includingseveral that may be found in the three patent applications Rabinowitz2006, 2008 and 2009 that are incorporated by reference. It is assumed,in this section, that phased father genotypic data is available,meaning, on all chromosomes, the ordered father genotype is known on allSNPs, i.e. one can distinguish between first and second haplotype of thefather's genotype. If the father's genotypic data is phased, and thus ABBA for father, while mother data is not phased, i.e. AB=BA for mother,then there are twelve different possible parental contexts: AA|AA,AA|AB, AA|BA, AA|BB, AB|AA, AB|AB, AB|BA, AB|BB, BB|AA, BB|AB, BB|BA,and BB|BB.

Measurements may be initially aggregated over SNPs from the sameparental context based on phased father genotypes. Each context may bedefined by the number of A and number of B alleles from the mother, fromthe first father strand and from the second father strand:[a_(m)b_(m)a_(f1)b_(f1)a_(f2)b_(f2)] where a_(m)+b_(m)=2 anda_(f1)+b_(f1)=1, a_(f2)+b_(f2)=1. The combination of 3 possible mothergenotypes and 4 possible father genotypes means that the measurementsfrom a single chromosome will consist of 24 context means, 12 on eachchannel.

Consider a copy number hypothesis for the child, for a particularchromosome, of the form (n_(m), n_(f)) where n_(m) is the number ofmother copies and n_(f) is the number of father copies of thechromosome. For the phased paternal genotype this hypothesis may bewritten in a form (n_(m), n_(f1), n_(f2)) where n_(m) is the number ofmother copies and n_(f1) is the number of father copies of first strand,n_(f2) is the number of father copies of second strand of thechromosome, where n_(f)=n_(f1)+n_(f2). (Note: this is different notationthan mentioned elsewhere in this disclosure, which is in the form(m,f_(x),f_(y)), and that takes into account the sex chromosome.) Thusthe normal disomy hypothesis, previously written in the form (n_(m),n_(f))=(1,1) can be extended into two sub-hypotheses (n_(m), n_(f1),n_(f2))=(1,1,0) and (n_(m), n_(f1), n_(f2))=(1,0,1), where the twopaternal haplotypes are differentiated. Maternal trisomy, previouslywritten in form (2,1), can be extended into sub-hypotheses (2,1,0) and(2,0,1). Paternal trisomy can be extended into sub-hypotheses includingpaternal mitotic trisomies (1,2,0), (1,0,2) and paternal meiotic trisomy(1,1,1).

Due to possible crossovers between paternal strands, the childhypothesis, written in the form (n_(m), n_(f1), n_(f2)), does not haveto stay the same throughout the chromosome. For example suppose that achromosome has normal disomy with first paternal strand (1,1,0), on aset of adjacent SNPs. If there is a crossover of paternal strands on thefollowing SNP, the copy number hypothesis of the child changes to(1,0,1), now involving second father strand.

In order to keep a hypothesis constant over a given set of SNPs for thepurpose of calculation, divide the chromosome into N segments ofadjacent SNPs. One may divide the chromosomes into segments in a numberof ways, for example, to keep the number of SNPs per segment constant,or to keep number of segments per chromosomes constant. Assume here thatthe copy number hypothesis is constant throughout the segment, with nocrossovers present. Ambiguous segments with possible paternal crossoversare omitted in this explanation for clarity.

For each segment, group the measurements by parental context, andaggregate the intensity measurements over each group of SNPs. Therefore,in this case, the measurements from a single chromosome will consist of24*N context means, 12*N on each channel (for each of N segments on achromosome).

Let the expected number of As (averaged over SNPs) in the child be k_(x)and the expected number of Bs be k_(y) (for a particular context,conditioned on a hypothesis). The expected number of alleles depends onthe context and the hypothesis. For each segment of the chromosome, foreach ordered parental context:

k _(x)=0.5a _(m) n _(m) +a _(f1) n _(f1) +a _(f2) n _(f2)

k _(y)=0.5b _(m) n _(m) +b _(f1) n _(f1) +b _(f2) n _(f2)

The model is similar to the model for unordered parental contexts:

x _(x)=α(m _(x) +δk _(x))

x _(y)=α(m _(y) +δk _(y))

y _(x) =f _(x)(x _(x))+b _(x) +v _(x)

y _(y) =f _(y)(x _(y))+b _(y) +v _(y)

and one may use the model f(x)=f₁x²+f₂x

Chromosomes that are assumed to be disomic may be used for fitting theparameters of the model (‘training’ chromosomes), i.e. assume that(n_(m), n_(f))=(1,1). One may determine the exact disomy sub-hypothesis,(n_(m), n_(f1), n_(f2)), either (1,1,0) or (1,0,1) on each segment ofeach ‘train’ chromosome, by looking at the intensity responses fordifferent ordered parental contexts, for each segment separately, asfollows:

First, determine the noise level for x channel response by looking atthe x channel response for parental context BB|BB, and determine thenoise level for y channel by looking at the y channel response forparental context AA|AA, (where the x channel measures A alleles, and they channel measures the B alleles). Then, if the hypothesis is (1,1,0),the y channel responses for ordered parental context AA|AB are expectedto only be noise, with no signal, and have the same behavior as theresponses for context AA|AA. Similarly, x channel responses for orderedparental context BB|BA are expected to only be noise, with no signal,and have the same behavior as the responses for context BB|BB.

If the hypothesis is (1,0,1), the y channel responses for orderedparental context AA|BA should only be noise, with no signal, and havethe same behavior as the responses for context AA|AA. Similarly, the xchannel responses for ordered parental context BB|AB should only benoise, with no signal, and have the same behavior as the responses forcontext BB|BB.

Choose, as the most likely sub-hypothesis on this segment, whichever oneof hypothesis (1,1,0) or (1,0,1), that fits the data better. One mayomit from further analysis segments where the choice is ambiguous, i.e.segments where crossover probably occurred.

In order to train the model using disomic chromosomes, fit theparameters (α, δ, f₁, f₂) for this model from the 12×2×N×n_(t)observations, where n_(t) is the number of ‘training’ chromosomes used.

In an embodiment, the focus is placed on detection of trisomies, whereone parent contributes an extra copy. Note that most viable aneuploidybirths are as a result of trisomies. Hypothesis fitting on ‘test’chromosomes (the chromosome of interest) may be done similarly as forunordered genotypes, except that each trisomy sub-hypothesis (forexample (101) vs. (110)) may be fit separately for each segment, and thehypothesis for the ploidy state of the segments may be aggregated, onlyfocusing on the overall ploidy state (now considering (101) and (110) tobe the same, both disomy; focusing on, for example, disomy vs. maternaltrisomy vs. paternal trisomy) and statistics may be calculated for wholechromosomes.

In particular suppose that, on segment i, the probability of aparticular sub-hypothesis in ordered hypothesis format isP_(i)(n_(m),n_(f1),n_(f2)). In unordered hypothesis format, calculatethe probability of the disomy hypothesis asP_(i)(n_(m),n_(f))=P_(i)(1,1)=P_(i)(1,1,0)+P_(i)(1,0,1). For maternaltrisomy P_(i)(2,1)=P_(i)(2,1,0)+P_(i)(2,0,1). For paternal trisomyP_(i)(1,2)=p_(mt)*(P_(i)(1,2,0)+P_(i)(1,0,2))+p_(me)*P_(i)(1,1,1), wherep_(mt) is the probability of mitotic paternal trisomy given thatpaternal trisomy occurred, and p_(me), is the probability of mitoticpaternal trisomy given that paternal trisomy occurred, determined fromliterature and general practice. Note that mitotic trisomies and meiotictrisomies may be differentiated, or they may not be differentiated.

Given the hypothesis probability P_(i)(n_(m),n_(f)) for each segmenti=1, . . . ,N, calculate the probability over the whole chromosome asP(n_(m),n_(f))=Π_(i=1, . . . ,N) P_(i)(n_(m),n_(f)). The hypothesis callfor each chromosome is made selecting the hypothesis with highestprobability.

It should be obvious, given the benefit of this disclosure, how tomodify the method for a case where the maternal genotype is phased, andthe paternal genotype is not phased. It should also be obvious, giventhe benefit of this disclosure, how to modify the method for a casewhere both the paternal and the maternal genotypes are phased.

Experimental Section

The experimental aspect of the invention is described here. In order todemonstrate the reduction of practice of the invention, a mixture ofcells from multiple individuals was made, where the ploidy state of theindividuals was known, and the algorithms described above were used todetermine the ploidy state of one of the individuals.

Genomic samples were prepared from a maternal (AG16778, CORIELL) and anoffspring (AG16777, CORIELL) tissue culture cell line. Cells were grownunder standard conditions (1×RPMI Medium 1640, 15% Fetal Bovine Serum(FBS), 0.85% Streptomycin), and genomic DNA was purified using a QIAAMPDNA Micro Kit (QIAGEN) according to manufacturer's recommendations.Purified DNA was quantified using a NANODROP instrument (THERMOSCIENTIFIC) and diluted to appropriate concentrations in 1×Tris-EDTAbuffer. A series of three mixed genomic samples (a-c) were prepared bycombining (a) 59.4 ng AG16777 DNA with 132.6 ng AG16778 DNA (30%AG16777), (b) 76.8 ng AG16777 DNA with 115.2 ng AG16778 DNA (40%AG16777), and (c) 115.2 ng AG16777 DNA with 76.8 ng AG16778 DNA (60%AG16777). The three samples were diluted in H₂O for a total DNAconcentration of 3 ng/ul. Samples were stored at −20 C, and thenanalyzed on the INFINIUM array platform (ILLUMINA), which was performedaccording to manufacturer's recommendations.

This method is appropriate for any nucleic acids which may be used forthe ILLUMINA INFINIUM array platform, or any other SNP based genotypingmethod, for example isolated free-floating DNA from plasma oramplifications (e.g. whole genome amplification, PCR) of the same,isolated genomic DNA from other cell types (e.g. lymphocytes) oramplifications of the same. Any method that generates genomic DNA (e.g.extraction of DNA, purification) may be used for sample preparation.

The genomic DNA used here was premixed to simulate a mix of fetal andmaternal DNA, however, the method is also applicable to DNA (oramplifications thereof) as such (i.e. not premixed). Three samples wereprepared from these cell lines, having 30, 40 and 60 percent ofoffspring DNA (relative to the mother). The offspring cell line hastrisomy on chromosome 21.

FIGS. 1A and 1B show the model parameter fit for (b), the 40 percentsample. The x-axis shows the total number of alleles on the channel ofinterest, {circumflex over (m)}+δ{circumflex over (k)}. These valuesrange from zero to four. Considering the x channel, there are noexpected alleles in the BB|BB context, ranging to four expected allelesin the AA|AA context, with two from the mother's DNA and two from thechild's DNA. The y-axis measures platform response as a function of thenumber of alleles. Circles are the measured context means (9 on eachchannel from each of the assumed disomic chromosomes) and the line showsthe corresponding value predicted by the model parameters p*, for thesame number of alleles. Note that the y-axis values on the two plots arequite different, showing that the x and y channel responses must bemodeled separately.

FIG. 2 shows the 18 components of the measurement y₁₆ from chromosome 16on the sample with 40 percent fetal DNA. The first nine measurements arefrom the x channel and the next nine measurements are from the ychannel. The contexts are ordered as follows: AA|AA, AA|AB, AA|BB,AB|AA, AB|AB, AB|BB, BB|AA, BB|AB, BB|BB. The 18 measurements arecompared to the predicted values for the three hypotheses H11, H12 andH21. It is clear that the data most closely matches the H11 hypothesis(disomy). The correct call was produced by the algorithm, with assignedprobability of 1.0 based on a uniform prior distribution.

FIG. 3 shows chromosome 21, which has a truth of H21. The correct callwas also made with assigned probability 1.0. The complete set ofhypothesis calls and assigned probabilities is shown in Table 1. Thecontext mean measurements for the classified chromosomes for samples(a), (b), and (c), are shown in Tables 2, 3 and 4, respectively. Inthese tables, columns correspond to the chromosomes and rows correspondto the context mean measurements, ordered as described for FIG. 2 bychannel and then by context.

TABLE 1 Algorithm hypothesis calls and assigned probabilities for eachclassified chromosome. The correct hypothesis for each chromosome isshown in the column header. Sample ch16 (H11) ch17 (H11) ch18 (H11) ch19(H11) ch20 (H11) ch21 (H21) ch22 (H11) 30% child H11 (1.0) H11 (1.0) H11(1.0) H11 (1.0) H11 (1.0) H21 (1.0) H11 (1.0) 40% child H11 (1.0) H11(1.0) H11 (1.0) H11 (1.0) H11 (1.0) H21 (1.0) H11 (1.0) 60% child H11(1.0) H11 (1.0) H11 (1.0) H11 (1.0) H11 (1.0) H21 (1.0) H11 (1.0)

TABLE 2 Context means from sample (a) with 30 percent child DNA ch16ch17 ch18 ch19 ch20 ch21 ch 22 (H11) (H11) (H11) (H11) (H11) (H21) (H11)13391.0 13396.3 12737.6 12610.0 14139.3 13669.2 13319.9 12720.2 12986.312257.6 11849.5 13484.8 12696.4 13033.4 12474.1 12259.0 11153.3 11295.913145.1 13000.0 11616.9 10096.4 10076.4 9118.7 10133.4 10396.8 10062.310119.0 9231.1 9342.0 8880.2 8523.3 9370.7 9120.1 8874.9 8133.1 7753.67552.1 7611.9 8629.2 8504.7 7635.7 4809.1 4907.5 4522.1 4103.8 4723.64222.9 4482.7 2778.9 2989.5 2708.7 2926.5 3029.2 2678.2 2813.7 932.0924.6 915.3 930.7 936.7 955.1 921.0 1530.8 1520.7 1452.7 1514.7 1557.41467.0 1507.1 4897.6 4880.0 4428.4 4652.7 5139.2 4370.2 4880.5 7991.47858.6 7259.3 7149.2 8376.7 7583.1 7284.4 12680.9 12625.4 12151.612257.1 13690.0 13237.8 12740.3 14408.8 14339.7 14093.1 13331.4 14892.014312.9 14088.5 15857.3 15468.9 14785.9 15413.2 16198.5 15942.9 15458.119034.9 19216.9 18527.3 17459.5 19282.5 19296.2 18271.1 19949.6 20399.319282.9 19156.4 21229.8 20667.8 20356.4 20759.7 20659.5 19992.0 19859.921461.8 21454.6 20586.5

TABLE 3 Context means from sample (b) with 40 percent child DNA ch16ch17 ch18 ch19 ch20 ch21 (H11) (H11) (H11) (H11) (H11) (H21) ch 2212550.6 12579.9 11992.0 11803.7 13301.4 13019.7 12486.2 11761.7 12087.911409.0 10959.7 12553.3 11961.6 12050.4 11470.2 11190.0 10345.8 10184.911997.4 12311.9 10620.9 9730.8 9724.5 8820.5 9744.3 10006.4 9796.99635.1 8663.3 8788.1 8336.2 8028.7 8812.4 8652.7 8313.7 7374.2 6921.76858.2 6788.7 7869.5 7931.6 6872.3 5240.3 5318.0 4882.5 4355.9 5226.34499.0 4848.2 2876.9 3081.3 2786.8 3032.0 3133.0 2721.5 2884.9 747.0739.0 725.6 742.6 754.5 775.7 735.5 1215.7 1202.2 1139.8 1195.3 1247.61162.4 1195.4 5141.4 5141.1 4647.9 4884.0 5419.2 4491.4 5066.3 8788.48589.7 8054.1 7676.0 9180.7 8239.9 7860.0 11789.9 11721.3 11387.411299.3 12930.0 12800.3 11749.7 14003.2 13876.2 13700.7 12946.6 14557.514022.1 13684.1 15911.6 15468.6 14791.0 15421.7 16203.6 16097.6 15506.318219.5 18507.0 17671.1 16562.4 18738.9 18920.1 17571.3 19453.4 19745.818751.0 18557.6 20749.5 20360.6 19775.2 20405.2 20347.0 19747.6 19406.021205.3 21409.5 20303.8

TABLE 4 Context means from sample (c) with 40 percent child DNA ch16ch17 ch18 ch19 ch20 ch21 (H11) (H11) (H11) (H11) (H11) (H21) ch 2214453.4 14433.9 13747.4 13574.8 15284.4 15340.9 14299.5 13022.7 13352.412687.0 11874.1 13907.0 13834.7 13265.3 12211.6 11774.2 10772.2 10840.112664.4 13763.4 11278.3 11674.6 11651.0 10614.0 11718.0 11951.0 11943.311750.2 9652.1 9865.7 9356.0 9078.6 9857.3 10296.7 9339.5 7456.1 6979.77149.0 6720.3 8212.8 8899.0 6930.3 7521.5 7710.6 7260.6 6449.0 7607.06434.7 7132.0 4117.2 4415.2 4005.3 4324.8 4473.7 3862.9 4128.1 860.4849.9 833.1 864.3 863.7 894.7 846.3 1360.4 1364.5 1275.6 1349.8 1398.11353.4 1345.3 7185.7 7019.5 6493.3 6789.4 7586.5 6193.1 6964.6 12481.012200.9 11442.4 11061.2 13030.5 11304.5 11423.5 11840.0 11758.5 11671.311148.7 13512.2 14165.6 11967.1 15575.7 15291.0 15286.0 14449.9 16439.016386.7 15302.0 19001.6 18522.8 17684.2 18689.6 19325.3 19377.0 18569.219158.2 19308.5 18786.6 17259.8 19780.8 21111.1 18409.1 21537.7 21797.020716.9 20383.8 23039.6 23429.0 21894.6 23523.5 23412.2 22665.6 22280.124448.8 25148.6 23313.8

In one embodiment, identification of parent haplotypes (parent phase)may be used to estimate the recombination locations that determine whichhaplotypes are present in the child. Identification of which parenthaplotype is present at each position in the child determines the childgenotype. This may result in lower model variances because positionswith different child genotypes will no longer be averaged. Certainmethods disclosed herein can be modified to detect meiotic trisomieswhen both of a parent's haplotypes are present.

Some Embodiments

In some embodiments of the present disclosure, a method for determiningthe ploidy state of one or more chromosome in a target individual mayinclude any of the following steps, and combinations thereof:

In some embodiments, genetic data from the target individual and fromone or more related individuals may be obtained. In one embodiment, therelated individuals include both parents of the target individual. Inone embodiment, the related individuals include siblings of the targetindividual. In one embodiment, the related individuals may include theparents and one or more grandparents. This genetic data for individualsmay be obtained from data in silico; it may be output data from aninformatics method designed to clean genetic data, or it may be fromother sources. In some embodiments of the invention, the genotypic dataof the parents can be obtained and optionally phased using methods foundin the three patent applications, Rabinowtiz 2006, 2008 and 2009,referenced elsewhere in this application. Any number of methods may beused to obtain the parental genotypic data provided that the set of SNPsmeasured on the mixed sample of fetal and maternal DNA is sufficientlyoverlapping with the set of SNPs for which that parental genotype isknown.

Amplification of the DNA, a process which transforms a small amount ofgenetic material to a larger amount of genetic material that contains asimilar set of genetic data, can be done by a wide variety of methods,including, but not limited to, Polymerase Chain Reaction (PCR), ligandmediated PCR, degenerative oligonucleotide primer PCR, MultipleDisplacement Amplification, allele-specific amplification techniques,Molecular Inversion Probes (MIP), padlock probes, other circularizingprobes, and combination thereof. Many variants of the standard protocolmay be used, for example increasing or decreasing the times of certainsteps in the protocol, increasing or decreasing the temperature ofcertain steps, increasing or decreasing the amounts of various reagents,etc. The DNA amplification transforms the initial sample of DNA into asample of DNA that is similar in the set of sequences, but of muchgreater quantity. In some cases, amplification may not be required.

The genetic data of the target individual and/or of the relatedindividual can be transformed from a molecular state to an electronicstate by measuring the appropriate genetic material using tools and ortechniques taken from a group including, but not limited to: genotypingmicroarrays, APPLIED BIOSCIENCE'S TAQMAN SNP genotyping assay, theILLUMINA genotyping system, for example the ILLUMINA BEADARRAY platformusing, for example, the 1M-DUO chip, an AFFYMETRIX GENOTYPING PLATFORM,such as the AFFYMETRIX 6.0 GENECHIP, AFFYMETRIX'S GENFLEX TAG array,other genotyping microarrays. A high throughput sequencing method may beused, such as Sanger DNA sequencing, pyrosequencing, the ILLUMINA SOLEXAplatform, ILLUMINA's GENOME ANALYZER, or APPLIED BIOSYSTEM's 454sequencing platform, HELICOS's TRUE SINGLE MOLECULE SEQUENCING platform,or any other sequencing method, fluorescent in-situ hybridization(FISH), array comparative genomic hybridization (CGH), other highthrough-put genotyping platforms, and combinations thereof. All of thesemethods physically transform the genetic data stored in a sample of DNAinto a set of genetic data that is typically stored in a memory deviceen route to being processed.

Any relevant individual's genetic data can be measured by analyzingsubstances taken from a group including, but not limited to: theindividual's bulk diploid tissue, one or more diploid cells from theindividual, one or more haploid cells from the individual, one or moreblastomeres from the target individual, extra-cellular genetic materialfound on the individual, extra-cellular genetic material from theindividual found in maternal blood, cells from the individual found inmaternal blood, one or more embryos created from (a) gamete(s) from therelated individual, one or more blastomeres taken from such an embryo,extra-cellular genetic material found on the related individual, geneticmaterial known to have originated from the related individual, andcombinations thereof.

In some embodiments, a set of at least one ploidy state hypothesis maybe created for each of the chromosomes of interest of the targetindividual. Each of the ploidy state hypotheses may refer to onepossible ploidy state of the chromosome or chromosome segment of thetarget individual. The set of hypotheses may include some or all of thepossible ploidy states that the chromosome of the target individual maybe expected to have. Some of the possible ploidy states may includenullsomy, monosomy, disomy, uniparental disomy, euploidy, trisomy,matching trisomy, unmatching trisomy, maternal trisomy, paternaltrisomy, tetrasomy, balanced (2:2) tetrasomy, unbalanced (3:1)tetrasomy, other aneuploidy, and they may additionally involveunbalanced translocations, balanced translocations, Robertsoniantranslocations, recombinations, deletions, insertions, crossovers, andcombinations thereof.

In some embodiments, the set of determined probabilities may then becombined. This may entail, for each hypothesis, averaging or multiplyingthe probabilities as determined by each technique, and it also mayinvolve normalizing the hypotheses. In some embodiments, theprobabilities may be combined under the assumption that they areindependent. The set of the products of the probabilities for eachhypothesis in the set of hypotheses is then output as the combinedprobabilities of the hypotheses.

In some embodiments of the invention, the determined probabilities asdetermined by the method disclosed herein may be combined withprobabilities of other hypotheses that are calculated by diagnosticmethods that work on entirely different precepts. For example, somemethods used for prenatal diagnosis involve measuring the levels ofcertain hormones in maternal blood, where those hormones are correlatedwith various genetic abnormalities. Some examples of this are the firsttrimester serum screen, the triple test, and the quad test. Some methodsinvolve measuring dimensions and other qualities of the fetus that canbe measured using ultrasound, for example, the nuchal translucency. Someof these methods can calculate a probability that the fetus is euploid,or is afflicted with trisomy, especially trisomy 18 and/or trisomy 21.In a case where multiple methods are used to determine the likelihood ofa given outcome, where none of the methods are definitive in and ofthemselves, it is possible to combine the information given by thosemethods to make a prediction that is more accurate than any of theindividual methods. For example, in the triple test, combining theinformation given by the three different hormones can result in aprediction of genetic abnormalities that is more accurate than any ofthe individual hormone levels may predict. In some embodiments, themethod involves measuring maternal blood levels of alpha-fetoprotein(AFP). In some embodiments, the method may involve measuring maternalblood levels of unconjugated estriol (UE₃). In some embodiments, themethod may involve measuring maternal blood levels of beta humanchorionic gonadotropin (β-hCG). In some embodiments, the method mayinvolve measuring maternal blood levels of invasive trophoblast antigen(ITA). In some embodiments, the method may involve measuring maternalblood levels of inhibin-A. In some embodiments, the method may involvemeasuring maternal blood levels of pregnancy-associated plasma protein A(PAPP-A). In some embodiments, the method may involve measuring maternalblood levels of other hormones or maternal serum markers. In someembodiments, some of the predictions may have been made using othermethods. In some embodiments, some of the predictions may have been madeusing a fully integrated test such as one that combines ultrasound andblood test at about 10-14 weeks of pregnancy and a second blood test atabout 15-20 weeks. In some embodiments, the method involves measuringthe fetal nuchal translucency (NT) as measured by ultrasound. In someembodiments, the method involves using the measured levels of theaforementioned hormones for making predictions. In some embodiments themethod involves a combination of the aforementioned methods.

The output of the method described herein can be combined with theoutput of one or a plurality of other methods. There are many ways tocombine the predictions, for example, one could convert the hormonemeasurements into a multiple of the median (MoM) and then intolikelihood ratios (LR). Similarly, other measurements could betransformed into LRs using the mixture model of NT distributions. TheLRs for NT and the biochemical markers could be multiplied by the ageand gestation-related risk to derive the risk for various conditions,such as trisomy 21. Detection rates (DRs) and false-positive rates(FPRs) could be calculated by taking the proportions with risks above agiven risk threshold.

One embodiment may involve a situation with four measured hormonelevels, where the probability distribution around those hormones isknown: p(x₁, x₂, x₃, x₄|e) for the euploid case and p(x₁, x₂, x₃, x₄|a)for the aneuploid case. Then one could measure the probabilitydistribution for the DNA measurements, g(y|e) and g(y|a) for the euploidand aneuploid cases respectively. Assuming they are independent, giventhe assumption of euploid/aneuploid, one could combine as p(x₁, x₂, x₃,x₄|a)g(y|a) and p(x₁, x₂, x₃, x₄|e)g(y|e) and then multiply each by theprior p(a) and p(e) given the maternal age. One could then choose thecase that is highest probability. In one embodiment it is possible toevoke the central limit theorem to assume distribution on g(y|a or e) isGaussian, and measure mean and standard deviations by looking atmultiple samples. In another embodiment, one could assume they are notindependent given the outcome and collect enough samples to estimate thejoint distribution p(x₁, x₂, x₃, x₄|a or e).

In one embodiment, the ploidy state for the target individual isdetermined to be the ploidy state that is associated with the hypothesiswhose probability is the greatest. In some cases, one hypothesis willhave a normalized, combined probability greater than 90%. Eachhypothesis is associated with one, or a set of, ploidy states, and theploidy state associated with the hypothesis whose normalized, combinedprobability is greater than 90%, or some other threshold value, such as50%, 80%, 95%, 98%, 99%, or 99.9%, may be chosen as the thresholdrequired for a hypothesis to be called as the determined ploidy state.

In some embodiments, the knowledge of the determined ploidy state may beused to make a clinical decision. This knowledge, typically stored as aphysical arrangement of matter in a memory device, may then betransformed into a report. The report may then be acted upon. Forexample, the clinical decision may be to terminate the pregnancy;alternately, the clinical decision may be to continue the pregnancy. Insome embodiments the clinical decision may involve an interventiondesigned to decrease the severity of the phenotypic presentation of agenetic disorder.

In some cases, it may be desirable to include a large number of relatedindividuals into the calculation to determine the most likely geneticstate of a target. In some cases, running the algorithm with all of thedesired related individuals may not be feasible due to limits ofcomputational power or time. The computing power needed to calculate themost likely allele values for the target may increase exponentially withthe number of sperm, blastomeres, and other input genotypes from relatedindividuals. In one embodiment, these problems may be overcome by usinga method termed “subsetting”, where the computations may be divided intosmaller sets, run separately, and then combined. In one embodiment ofthe present disclosure, one may have the genetic data of the parentsalong with that of ten embryos and ten sperm. In this embodiment, onecould run several smaller sub-algorithms with, for example three embryosand three sperm, and then pool the results. In one embodiment the numberof sibling embryos used in the determination may be from one to three,from three to five, from five to ten, from ten to twenty, or more thantwenty. In one embodiment the number of sperm whose genetic data isknown may be from one to three, from three to five, from five to ten,from ten to twenty, or more than twenty. In one embodiment eachchromosome may be divided into two to five, five to ten, ten to twenty,or more than twenty subsets.

In one embodiment of the invention, any of the methods described hereinmay be modified to allow for multiple targets to come from same targetindividual, for example, multiple blood draws from the same pregnantmother. This may improve the accuracy of the model, as multiple geneticmeasurements may provide more data with which the target genotype may bedetermined. In one embodiment, one set of target genetic data served asthe primary data which was reported, and the other served as data todouble-check the primary target genetic data. In one embodiment, aplurality of sets of genetic data, each measured from genetic materialtaken from the target individual, are considered in parallel, and thusboth sets of target genetic data serve to help determine which sectionsof parental genetic data, measured with high accuracy, composes thefetal genome.

In some embodiments the source of the genetic material to be used indetermining the genetic state of the fetus may be fetal cells, such asnucleated fetal red blood cells, isolated from the maternal blood. Themethod may involve obtaining a blood sample from the pregnant mother.The method may involve isolating a fetal red blood cell using visualtechniques, based on the idea that a certain combination of colors areuniquely associated with nucleated red blood cell, and a similarcombination of colors is not associated with any other present cell inthe maternal blood. The combination of colors associated with thenucleated red blood cells may include the red color of the hemoglobinaround the nucleus, which color may be made more distinct by staining,and the color of the nuclear material which can be stained, for example,blue. By isolating the cells from maternal blood and spreading them overa slide, and then identifying those points at which one sees both red(from the Hemoglobin) and blue (from the nuclear material) one may beable to identify the location of nucleated red blood cells. One may thenextract those nucleated red blood cells using a micromanipulator, usegenotyping and/or sequencing techniques to measure aspects of thegenotype of the genetic material in those cells.

In one embodiment, one may stain the nucleated red blood cell with a diethat only fluoresces in the presence of fetal hemoglobin and notmaternal hemoglobin, and so remove the ambiguity between whether anucleated red blood cell is derived from the mother or the fetus. Someembodiments of the present disclosure may involve staining or otherwisemarking nuclear material. Some embodiments of the present disclosure mayinvolve specifically marking fetal nuclear material using fetal cellspecific antibodies.

There are many other ways to isolate fetal cells from maternal blood, orfetal DNA from maternal blood, or to enrich samples of fetal geneticmaterial in the presence of maternal genetic material. Some of thesemethods are listed here, but this is not intended to be an exhaustivelist. Some appropriate techniques are listed here for convenience: usingfluorescently or otherwise tagged antibodies, size exclusionchromatography, magnetically or otherwise labeled affinity tags,epigenetic differences, such as differential methylation between thematernal and fetal cells at specific alleles, density gradientcentrifugation succeeded by CD45/14 depletion and CD71-positiveselection from CD45/14 negative-cells, single or double Percollgradients with different osmolalities, or galactose specific lectinmethod.

In one embodiment of the present disclosure, the target individual is afetus, and the different genotype measurements are made on a pluralityof DNA samples from the fetus. In some embodiments of the invention, thefetal DNA samples are from isolated fetal cells where the fetal cellsmay be mixed with maternal cells. In some embodiments of the invention,the fetal DNA samples are from free floating fetal DNA, where the fetalDNA may be mixed with free floating maternal DNA. In some embodiments,the fetal DNA may be mixed with maternal DNA in ratios ranging from99.9:0.1% to 90:10%; 90:10% to 50:50%; 50:50% to 10:90%; or 10:90% to0.1:99.9%.

In one embodiment of the present disclosure, one may use an informaticsbased technique such as the ones described in this disclosure todetermine whether or not the cells are in fact fetal in origin. In oneembodiment of the present disclosure, one may then use an informaticsbased technique such as the ones described in this disclosure todetermine the ploidy state of one or a set of chromosomes in thosecells. In one embodiment of the present disclosure, one may then use aninformatics based technique such as the ones described in thisdisclosure to determine the genetic state of the cells. When applied tothe genetic data of the cell, PARENTAL SUPPORT™ could indicate whetheror not a nucleated red blood cell is fetal or maternal in origin byidentifying whether the cell contains one chromosome from the mother andone from the father, which would indicate that it is fetal, or twochromosomes from the mother, which would indicate that it is maternal.

In one embodiment, the method may be used for the purpose of paternitytesting. For example, given the SNP-based genotypic information from themother, and from a man who may or may not be the genetic father, and themeasured genotypic information from the mixed sample, it is possible todetermine if the genotypic information of the male indeed representsthat actual genetic father of the gestating fetus. A simple way to dothis is to simply look at the contexts where the mother is AA, and thepossible father is AB or BB. In these cases, one may expect to see thefather contribution half (AA|AB) or all (AA|BB) of the time,respectively. Taking into account the expected ADO, it isstraightforward to determine whether or not the fetal SNPs that areobserved are correlated with those of the possible father.

One embodiment of the present disclosure could be as follows: a pregnantwoman wants to know if her fetus is afflicted with Down Syndrome, and/orif it will suffer from Cystic Fibrosis, and she does not wish to bear achild that is afflicted with either of these conditions. A doctor takesher blood, and stains the hemoglobin with one marker so that it appearsclearly red, and stains nuclear material with another marker so that itappears clearly blue. Knowing that maternal red blood cells aretypically anuclear, while a high proportion of fetal cells contain anucleus, he is able to visually isolate a number of nucleated red bloodcells by identifying those cells that show both a red and blue color.The doctor picks up these cells off the slide with a micromanipulatorand sends them to a lab which amplifies and genotypes ten individualcells. By using the genetic measurements, the PARENTAL SUPPORT™ methodis able to determine that six of the ten cells are maternal blood cells,and four of the ten cells are fetal cells. If a child has already beenborn to a pregnant mother, PARENTAL SUPPORT™ can also be used todetermine that the fetal cells are distinct from the cells of the bornchild by making reliable allele calls on the fetal cells and showingthat they are dissimilar to those of the born child. Note that thismethod is similar in concept to the paternal testing embodiment of theinvention. The genetic data measured from the fetal cells may be of verypoor quality, containing many allele drop outs, due to the difficulty ofgenotyping single cells. The clinician is able to use the measured fetalDNA along with the reliable DNA measurements of the parents to inferaspects of the genome of the fetus with high accuracy using PARENTALSUPPORT™, thereby transforming the genetic data contained on geneticmaterial from the fetus into the predicted genetic state of the fetus,stored on a computer. The clinician is able to determine both the ploidystate of the fetus, and the presence or absence of a plurality ofdisease-linked genes of interest. It turns out that the fetus iseuploidy, and is not a carrier for cystic fibrosis, and the motherdecides to continue the pregnancy.

In another embodiment, a couple where the mother, who is pregnant, andis of advanced maternal age wants to know whether the gestating fetushas Down syndrome or some other chromosomal abnormality. Theobstetrician takes a blood draw from the mother and father. A techniciancentrifuges the maternal sample to isolate the plasma and the buffycoat. The DNA in the buffy coat and the paternal blood sample aretransformed through amplification and the genetic data encoded in theamplified genetic material is further transformed from molecularlystored genetic data into electronically stored genetic data by runningthe genetic material on a SNP array to measure the parental genotypes.The plasma sample is may be further processed by a method such asrunning a gel, or using a size exclusion column, to isolate specificsize fractions of DNA. Other methods may be used to enrich the fractionof fetal DNA in the sample. An informatics based technique that includesthe invention described herein, such as PARENTAL SUPPORT™, may be usedto determine the ploidy state of the fetus. It is determined that thefetus has Down syndrome. A report is printed out, or sent electronicallyto the pregnant woman's obstetrician, who transmits the diagnosis to thewoman. The woman, her husband, and the doctor sit down and discuss theoptions. The couple decides to terminate the pregnancy based on theknowledge that the fetus is afflicted with a trisomic condition.

In another embodiment, a pregnant woman, hereafter referred to as ‘themother’ may decide that she wants to know whether or not her fetus(es)are carrying any genetic abnormalities or other conditions. She may wantto ensure that there are not any gross abnormalities before she isconfident to continue the pregnancy. She may go to her obstetricsdoctor, who may take a sample of her blood. He may also take a geneticsample, such as a buccal swab, from her cheek. He may also take agenetic sample from the father of the fetus, such as a buccal swab, asperm sample, or a blood sample. The doctor may enrich the fraction offree floating fetal DNA in the maternal blood sample. The doctor mayenrich the fraction of enucleated fetal blood cells in the maternalblood sample. The doctor may use various aspects of the method describedherein to determine genotypic data of the fetus. That genotypic data mayinclude the ploidy state of the fetus, and/or the identity of one or anumber of alleles in the fetus. A report may be generated summarizingthe results of the prenatal diagnosis. The doctor may tell the motherthe genetic state of the fetus. The mother may decide to discontinue thepregnancy based on the fact that the fetus has one or more chromosomal,or genetic abnormalities, or undesirable conditions. She may also decideto continue the pregnancy based on the fact that the fetus does not haveany gross chromosomal or genetic abnormalities, or any geneticconditions of interest.

Another example may involve a pregnant woman who has been artificiallyinseminated by a sperm donor, and is pregnant. She is wants to minimizethe risk that the fetus she is carrying has a genetic disease. She hasblood drawn at a phlebotomist, and techniques described in thisdisclosure are used to isolate three nucleated fetal red blood cells,and a tissue sample is also collected from the mother and geneticfather. The genetic material from the fetus and from the mother andfather are amplified as appropriate and genotyped using the ILLUMINAINFINIUM BEADARRAY, and the methods described herein clean and phase theparental and fetal genotype with high accuracy, as well as to makeploidy calls for the fetus. The fetus is found to be euploid, andphenotypic susceptibilities are predicted from the reconstructed fetalgenotype, and a report is generated and sent to the mother's physicianso that they can decide what clinical decisions may be best.

Another example may involve a woman who is pregnant but, owing to havinghad more than one sexual partner, is not certain of the paternity of herfetus. The woman wants to know who is the genetic father of the fetusshe is carrying. She and one of her sexual partners go to the hospitaland both donate a blood sample. The clinician, using the methodsdescribed in this disclosure, is able to determine the paternity of thefetus.

In some embodiments of the present disclosure, a plurality of parametersmay be changed without changing the essence of the present disclosure.For example, the genetic data may be obtained using any high throughputgenotyping platform, or it may be obtained from any genotyping method,or it may be simulated, inferred or otherwise known. A variety ofcomputational languages could be used to encode the algorithms describedin this disclosure, and a variety of computational platforms could beused to execute the calculations. For example, the calculations could beexecuted using personal computers, supercomputers, and parallelcomputers.

In some embodiments of the invention, the method may be implemented indigital electronic circuitry, or in computer hardware, firmware,software, or in combinations thereof. Apparatus of the invention can beimplemented in a computer program product tangibly embodied in amachine-readable storage device for execution by a programmableprocessor; and method steps of the invention can be performed by aprogrammable processor executing a program of instructions to performfunctions of the invention by operating on input data and generatingoutput. The invention can be implemented advantageously in one or morecomputer programs that are executable on a programmable system includingat least one programmable processor coupled to receive data andinstructions from, and to transmit data and instructions to, a datastorage system, at least one input device, and at least one outputdevice. Each computer program can be implemented in a high-levelprocedural or object-oriented programming language, or in assembly ormachine language if desired; and in any case, the language can be acompiled or interpreted language. Suitable processors include, by way ofexample, both general and special purpose microprocessors. Generally, aprocessor will receive instructions and data from a read-only memoryand/or a random access memory. Generally, a computer will include one ormore mass storage devices for storing data files; such devices includemagnetic disks, such as internal hard disks and removable disks;magneto-optical disks; and optical disks. Storage devices suitable fortangibly embodying computer program instructions and data include allforms of non-volatile memory, including by way of example semiconductormemory devices, such as EPROM, EEPROM, and flash memory devices;magnetic disks such as internal hard disks and removable disks;magneto-optical disks; and CD-ROM disks. Any of the foregoing can besupplemented by, or incorporated in, ASICs (application-specificintegrated circuits). The results may be output in the form of a printedreport, a display on a screen, or may be saved by way of a memory devicethat involves storage of information by way of a physical change in thesubstrate of the memory device, such as those listed above. A reportdescribing the determination of the ploidy state of the fetus, either inprint, or electronically, may be generated that transmits theinformation to a heath care practitioner, and/or the parent. A clinicaldecision may be made based on the determination. In some embodiments,the clinical decision to terminate a pregnancy may be made contingentupon that the fetus is aneuploid; the undesirability of the condition ofaneuploidy in the fetus provides the basis for the decision to terminatethe pregnancy. In some embodiments of the invention, the method includesthe decision to terminate or to not terminate a pregnancy, and may alsoinclude the execution of that decision.

In one embodiment, the raw genetic material of the mother and father istransformed by way of amplification to an amount of DNA that is similarin sequence, but larger in quantity. Then, by way of a genotyping methodthe genotypic data that is encoded by nucleic acids is transformed intogenetic measurements that may be stored physically and/or electronicallyon a memory device, such as those described above. The relevantalgorithms that makeup the PARENTAL SUPPORT™ algorithm, relevant partsof which are discussed in detail in this disclosure, are translated intoa computer program, using a programming language. Then, through theexecution of the computer program on the computer hardware, instead ofbeing physically encoded bits and bytes, arranged in a pattern thatrepresents raw measurement data, they become transformed into a patternthat represents a high confidence determination of the ploidy state ofthe fetus. The details of this transformation will rely on the dataitself and the computer language and hardware system used to execute themethod described herein, but is predictable if those contexts are known.Then, the data that is physically configured to represent a high qualityploidy determination of the fetus is transformed into a report which maybe sent to a health care practitioner. This transformation may becarried out using a printer or a computer display. The report may be aprinted copy, on paper or other suitable medium, or else it may beelectronic. In the case of an electronic report, it may be transmitted,it may be physically stored on a memory device at a location on thecomputer accessible by the health care practitioner; it also may bedisplayed on a screen so that it may be read. In the case of a screendisplay, the data may be transformed to a readable format by causing thephysical transformation of pixels on the display device. Thetransformation may be accomplished by way of physically firing electronsat a phosphorescent screen, by way of altering an electric charge thatphysically changes the transparency of a specific set of pixels on ascreen that may lie in front of a substrate that emits or absorbsphotons. This transformation may be accomplished by way of changing thenanoscale orientation of the molecules in a liquid crystal, for example,from nematic to cholesteric or smectic phase, at a specific set ofpixels. This transformation may be accomplished by way of an electriccurrent causing photons to be emitted from a specific set of pixels madefrom a plurality of light emitting diodes arranged in a meaningfulpattern. This transformation may be accomplished by any other way usedto display information, such as a computer screen, or some other outputdevice or way of transmitting information. The health care practitionermay then act on the report, such that the data in the report istransformed into an action. The action may be to continue or discontinuethe pregnancy, in which case a gestating fetus with a geneticabnormality is transformed into non-living fetus. The transformationslisted herein may be aggregated, such that, for example, one maytransform the genetic material of a pregnant mother and the father,through a number of steps outlined in this disclosure, into a medicaldecision consisting of aborting a fetus with genetic abnormalities, orconsisting of continuing the pregnancy. Alternately, one may transform aset of genotypic measurements into a report that helps a physician treathis pregnant patient.

In one embodiment of the invention, the method described herein can beused to determine the ploidy state of a fetus even when the host mother,i.e. the woman who is pregnant, is not the biological mother of thefetus she is carrying.

Some of the math in this disclosure makes hypotheses concerning alimited number of states of aneuploidy. In some cases, for example, onlyzero, one or two chromosomes are expected to originate from each parent.In some embodiments of the present disclosure, the mathematicalderivations can be expanded to take into account other forms ofaneuploidy, such as quadrosomy, where three chromosomes originate fromone parent, pentasomy, hexasomy etc., without changing the fundamentalconcepts of the present disclosure. At the same time, it is possible tofocus on a smaller number of ploidy states, for example, only trisomyand disomy. Note that ploidy determinations that indicate a non-wholenumber of chromosomes may indicate mosaicism in a sample of geneticmaterial.

In some embodiments of the present disclosure, a related individual mayrefer to any individual who is genetically related, and thus shareshaplotype blocks with the target individual. Some examples of relatedindividuals include: biological father, biological mother, son,daughter, brother, sister, half-brother, half-sister, grandfather,grandmother, uncle, aunt, nephew, niece, grandson, granddaughter,cousin, clone, the target individual himself/herself/itself, and otherindividuals with known genetic relationship to the target. The term‘related individual’ also encompasses any embryo, fetus, sperm, egg,blastomere, blastocyst, or polar body derived from a related individual.

In some embodiments of the present disclosure, the target individual mayrefer to an adult, a juvenile, a fetus, an embryo, a blastocyst, ablastomere, a cell or set of cells from an individual, or from a cellline, or any set of genetic material. The target individual may bealive, dead, frozen, or in stasis. In some embodiments of the presentdisclosure, as all living or once living creatures contain genetic data,the methods are equally applicable to any live or dead human, animal, orplant that inherits or inherited chromosomes from other individuals.

It is also important to note that the fetal genetic data that can begenerated by measuring the amplified DNA from a small sample of fetalDNA can be used for multiple purposes. For example, it can be used fordetecting aneuploidy, uniparental disomy, unbalanced translocations,sexing the individual, as well as for making a plurality of phenotypicpredictions based on phenotype-associated alleles. In some embodiments,particular genetic conditions may be focused on before screening, and ifcertain loci are especially relevant to those genetic conditions, then amore appropriate set of SNPs which are more likely to co-segregate withthe locus of interest, can be selected, thus increasing the confidenceof the allele calls of interest.

In some embodiments, the genetic abnormality is a type of aneuploidy,such as Down syndrome (or trisomy 21), Edwards syndrome (trisomy 18),Patau syndrome (trisomy 13), Turner Syndrome (45X0) and Klinefelter'ssyndrome (a male with 2X chromosomes). Congenital disorders, such asthose listed in the prior sentence, are commonly undesirable, and theknowledge that a fetus is afflicted with one or more phenotypicabnormalities may provide the basis for a decision to terminate thepregnancy.

In some embodiments, the phenotypic abnormality may be a limbmalformation, or a neural tube defect. Limb malformations may include,but are not limited to, amelia, ectrodactyly, phocomelia, polymelia,polydactyly, syndactyly, polysyndactyly, oligodactyly, brachydactyly,achondroplasia, congenital aplasia or hypoplasia, amniotic bandsyndrome, and cleidocranial dysostosis.

In some embodiments, the phenotypic abnormality may be a congenitalmalformation of the heart. Congenital malformations of the heart mayinclude, but are not limited to, patent ductus arteriosus, atrial septaldefect, ventricular septal defect, and tetralogy of fallot.

In some embodiments, the phenotypic abnormality may be a congenitalmalformation of the nervous system. Congenital malformations of thenervous system include, but are not limited to, neural tube defects(e.g., spina bifida, meningocele, meningomyelocele, encephalocele andanencephaly), Arnold-Chiari malformation, the Dandy-Walker malformation,hydrocephalus, microencephaly, megencephaly, lissencephaly,polymicrogyria, holoprosencephaly, and agenesis of the corpus callosum.

In some embodiments, the phenotypic abnormality may be a congenitalmalformation of the gastrointestinal system. Congenital malformations ofthe gastrointestinal system include, but are not limited to, stenosis,atresia, and imperforate anus.

In some embodiments, the genetic abnormality is either monogenic ormultigenic. Genetic diseases include, but are not limited to, BloomSyndrome, Canavan Disease, Cystic fibrosis, Familial Dysautonomia,Riley-Day syndrome, Fanconi Anemia (Group C), Gaucher Disease, Glycogenstorage disease 1a, Maple syrup urine disease, Mucolipidosis IV,Niemann-Pick Disease, Tay-Sachs disease, Beta thalessemia, Sickle cellanemia, Alpha thalessemia, Beta thalessemia, Factor XI Deficiency,Friedreich's Ataxia, MCAD, Parkinson disease-juvenile, Connexin26, SMA,Rett syndrome, Phenylketonuria, Becker Muscular Dystrophy, DuchennesMuscular Dystrophy, Fragile X syndrome, Hemophilia A, Alzheimerdementia-early onset, Breast/Ovarian cancer, Colon cancer,Diabetes/MODY, Huntington disease, Myotonic Muscular Dystrophy,Parkinson Disease—early onset, Peutz-Jeghers syndrome, Polycystic KidneyDisease, Torsion Dystonia.

In some embodiments, the systems, methods, and techniques of the presentdisclosure are used in methods to increase the probability of implantingan embryo obtained by in vitro fertilization that is at a reduced riskof carrying a predisposition for a genetic disease.

In an embodiment of the present disclosure, methods are disclosed forthe determination of the ploidy state of a target individual where themeasured genetic material of the target is contaminated with geneticmaterial of the mother, by using the knowledge of the maternal geneticdata. This is in contrast to methods that are able to determine theploidy state of a target individual from genetic data that is noisy dueto poor measurements; the contamination in this data is random. This isalso in contrast to methods that are able to determine the ploidy stateof a target individual from genetic data that is difficult to interpretbecause of contamination by DNA from unrelated individuals; thecontamination in that data is genetically random. In an embodiment, themethods disclosed herein are able to determine the ploidy state of atarget individual when the difficulty of interpretation is due tocontamination of DNA from a parent; the contamination in this data is atleast half identical to the target data, and is therefore difficult tocorrect for. In order to achieve this end, in an embodiment a method ofthe present disclosure uses the knowledge of the contaminating maternalgenotype to create a model of the expected genetic measurements given amixture of the maternal and the target genetic material, wherein thetarget genetic data is not known a priori. This step is not necessarywhere the uncertainty in the genetic data is due to random noise.

In an embodiment, a method for determining the copy number of achromosome of interest in a target individual, using genotypicmeasurements made on genetic material from the target individual,wherein the genetic material of the target individual is mixed withgenetic material from the mother of the target individual, comprisesobtaining genotypic data for a set of SNPs of the parents of the targetindividual; making genotypic measurements for the set of SNPs on a mixedsample that comprises DNA from the target individual and also DNA fromthe mother of the target individual; creating, on a computer, a set ofploidy state hypothesis for the chromosome of interest of the targetindividual; determining, on the computer, the probability of each of thehypotheses given the genetic measurements of the mixed sample and of thegenetic data of the parents of the target individual; and using thedetermined probabilities of each hypothesis to determine the most likelycopy number of the chromosome of interest in the target individual. Inan embodiment, the target individual and the parents of the targetindividual are human test subjects.

In an embodiment, a computer implemented method for determining the copynumber of a chromosome of interest in a target individual, usinggenotypic measurements made on genetic material from the targetindividual, where the genetic material of the target individual is mixedwith genetic material from the mother of the target individual,comprises obtaining genotypic data for a set of SNPs of the parents ofthe target individual; making genotypic measurements for the set of SNPson a mixed sample that comprises DNA from the target individual and alsoDNA from the mother of the target individual; creating, on a computer, aset of ploidy state hypothesis for the chromosome of interest of thetarget individual; determining, on the computer, the probability of eachof the hypotheses given the genetic measurements of the mixed sample andof the genetic data of the parents of the target individual; and usingthe determined probabilities of each hypothesis to determine the mostlikely copy number of the chromosome of interest in the targetindividual.

In an embodiment, a method for determining the copy number of achromosome of interest in a target individual, using genotypicmeasurements made on genetic material from the target individual,wherein the genetic material of the target individual is mixed withgenetic material from the mother of the target individual, comprisesobtaining genotypic data for a set of SNPs of the mother of the targetindividual; making genotypic measurements for the set of SNPs on a mixedsample that comprises DNA from the target individual and also DNA fromthe mother of the target individual; creating, on a computer, a set ofploidy state hypothesis for the chromosome of interest of the targetindividual; determining, on the computer, the probability of each of thehypotheses given the genetic measurements of the mixed sample and of thegenetic data of the mother of the target individual; and using thedetermined probabilities of each hypothesis to determine the most likelycopy number of the chromosome of interest in the target individual.

In an embodiment, a computer implemented method for determining the copynumber of a chromosome of interest in a target individual, usinggenotypic measurements made on genetic material from the targetindividual, where the genetic material of the target individual is mixedwith genetic material from the mother of the target individual,comprises obtaining genotypic data for a set of SNPs of the mother ofthe target individual; making genotypic measurements for the set of SNPson a mixed sample that comprises DNA from the target individual and alsoDNA from the mother of the target individual; creating, on a computer, aset of ploidy state hypothesis for the chromosome of interest of thetarget individual; determining, on the computer, the probability of eachof the hypotheses given the genetic measurements of the mixed sample andof the genetic data of the mother of the target individual; and usingthe determined probabilities of each hypothesis to determine the mostlikely copy number of the chromosome of interest in the targetindividual.

Combinations of the Aspects of the Present Disclosure

As noted previously, given the benefit of this disclosure, there aremore aspects and embodiments that may implement one or more of thesystems, methods, and features, disclosed herein. All patents, patentapplications, and published references cited herein are herebyincorporated by reference in their entirety. It will be appreciated thatseveral of the above-disclosed and other features and functions, oralternatives thereof, may be desirably combined into many otherdifferent systems or applications. Various presently unforeseen orunanticipated alternatives, modifications, variations, or improvementstherein may be subsequently made by those skilled in the art which arealso intended to be encompassed by the following claims.

What is claimed is:
 1. A method for determining the risk of aneuploidy of at least one chromosome or chromosome segment of interest in the genome of a gestating fetus, the method comprising: a) binding a plurality of probes to maternal and fetal DNA from a blood, serum, or plasma sample from the mother of the gestating fetus, at each of a plurality of loci on a chromosome or chromosome segment of interest, and at each of a plurality of polymorphic loci on at least one chromosome that is expected to be disomic in both the mother and the fetus where the mother is homozygous for a first allele at that locus, and the father is (i) heterozygous for the first allele and a second allele or (ii) homozygous for a second allele at that locus; b) amplifying each of the plurality of loci having a bound probe to obtain amplified products comprising the plurality of loci on the at least one chromosome or chromosome segment of interest and the plurality of polymorphic loci on the at least one chromosome that is expected to be disomic; c) determining a bias of a technique used to measure the amount of the amplified products, wherein the bias is used to statistically correct the measured genetic data at the plurality of loci on the chromosome or chromosome segment of interest and the measured quantity of each allele at the plurality of polymorphic loci on the at least one chromosome that is expected to be disomic; d) performing microarray analysis to obtain the amount of amplified products derived from the fetal DNA in the blood, serum, or plasma sample using (i) the statistically corrected, measured quantity of the second allele or (ii) the statistically corrected, measured quantity of the first and second alleles for each of the polymorphic loci on the at least one chromosome that is expected to be disomic; and e) determining, on a computer, the risk of aneuploidy of the at least one chromosome or chromosome segment of interest in the genome of the fetus using the measured amount of amplified products derived from the fetal DNA and the measured genetic data for the at least one chromosome or chromosome segment of interest.
 2. The method of claim 1, wherein a maximum likelihood estimate is used to determine the amount of amplified products derived from the fetal DNA.
 3. The method of claim 2, wherein the maximum likelihood estimate is performed using a gradient descent method.
 4. The method of claim 2, wherein the maximum likelihood estimate is performed using a Newton-Raphson optimization method.
 5. The method of claim 1, wherein prior probabilities of aneuploidy given maternal age and/or gestational age of the mother are used in determining the risk of aneuploidy, in addition to the statistically corrected, measured genetic data from the at least one chromosome or chromosome segment of interest and the determined amount of amplified products derived from the fetal DNA in the sample.
 6. The method of claim 1, further comprising aggregating the statistically corrected, measured genetic data from the plurality of loci on the at least one chromosome or chromosome segment of interest to determine an aggregated value, and using the aggregated value to determine the risk of aneuploidy of the at least one chromosome or chromosome segment of interest.
 7. The method of claim 1, wherein determining the risk of aneuploidy of the at least one chromosome or chromosome segment of interest comprises comparing a mean value for the statistically corrected, measured genetic data from the plurality of loci on each of the at least one chromosome or chromosome segment of interest to a mean value for the measured genetic data from the plurality of loci on one or more of the at least one chromosome that is expected to be disomic.
 8. The method of claim 1, wherein at least one of the at least one chromosome or chromosome segment of interest is selected from the group consisting of chromosome 13, chromosome 18, chromosome 21, chromosome X, chromosome Y, and combinations thereof.
 9. The method of claim 1, wherein the fetal DNA in the sample is not preferentially enriched over the maternal DNA before performance of the method. 